auto_process_ngs.barcodes.pipeline

Pipeline components for analysing barcodes.

Pipeline classes:

  • AnalyseBarcodes

Pipeline task classes:

  • SetupBarcodeAnalysisDirs

  • CountBarcodes

  • ListBarcodeCountFiles

  • DetermineMismatches

  • ReportBarcodeAnalysis

class auto_process_ngs.barcodes.pipeline.AnalyseBarcodes(bcl2fastq_dir=None, sample_sheet=None)

Analyse the barcodes for Fastqs in a sequencing run

Pipeline to perform barcode analysis on the Fastqs produced by bcl2fastq from a sequencing run.

run(barcode_analysis_dir, bcl2fastq_dir=None, title=None, lanes=None, mismatches=None, bases_mask=None, cutoff=None, sample_sheet=None, force=False, working_dir=None, log_file=None, batch_size=None, max_jobs=1, poll_interval=5, runners=None, default_runner=None, verbose=False)

Run the tasks in the pipeline

Parameters:
  • barcode_analysis_dir (str) – path to the directory to write the analysis results to

  • bcl2fastq_dir (str) – path to the bcl2fastq outputs (must be supplied here if a bcl2fastq output directory was not supplied on pipeline creation)

  • title (str) – optional, title for output reports

  • lanes (list) – optional, list of lanes to restrict the analysis to

  • mismatches (int) – optional, explicitly specify the number of mismatches to allow (default: determine number of mismatches automatically)

  • bases_mask (str) – optional, bases mask used for Fastq generation and demultiplexing

  • cutoff (float) – optional, don’t report barcodes with a fraction of associated reads below this value (e.g. ‘0.001’ excludes barcodes with < 0.1% of reads) (default: don’t apply a cutoff)

  • sample_sheet (str) – optional, sample sheet to check barcode sequences against

  • force (bool) – if True then force regeneration of counts (default: re-use existing counts)

  • working_dir (str) – optional path to a working directory (defaults: temporary directory in the current directory)

  • log_file (str) – path to write log file to (default: don’t write a log file)

  • batch_size (int) – if set then run commands in each task in batches, with each batch running this many commands at a time (default: run one command per job)

  • max_jobs (int) – optional maximum number of concurrent jobs in scheduler (default: 1)

  • poll_interval (float) – optional polling interval (seconds) to set in scheduler (default: 5s)

  • runners (dict) – mapping of names to JobRunner instances; valid names are ‘barcode_analysis_runner’ and ‘default’

  • default_runner (JobRunner) – optional default job runner to use

  • verbose (bool) – if True then report additional information for diagnostics

class auto_process_ngs.barcodes.pipeline.CountBarcodes(_name, *args, **kws)

Generate barcode counts for a project

init(illumina_data, project, counts_dir, lanes=None, use_project_name=None)

Initialise the CountBarcodes task

Parameters:
  • illumina_data (IlluminaData) – IlluminaData object for the Fastq data that should be examined

  • project (str) – name of project with Fastqs to get barcode codes from

  • counts_dir (str) – directory to write counts files to

  • lanes (list) – optional list of lanes to restrict counts generation to

  • use_project_name (str) – optional alternative name for project to use in counts file names (default is to use the name from the supplied project)

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.barcodes.pipeline.ListBarcodeCountFiles(_name, *args, **kws)

Collect counts files from a directory

init(counts_dir)

Initialise the ListBarcodeCountFiles task

Parameters:

counts_dir (str) – directory holding the counts files

Outputs:
counts_files (list): list of counts

files

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.barcodes.pipeline.LoadIlluminaData(_name, *args, **kws)

Load up an IlluminaData object from the bcl2fastq outputs

init(bcl2fastq_dir)
setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.barcodes.pipeline.ReportBarcodeAnalysis(_name, *args, **kws)

Perform analysis and reporting of barcode counts

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(counts_files, barcode_analysis_dir, lanes=None, mismatches=None, cutoff=0.001, sample_sheet=None, title=None)

Initialise the ReportBarcodeAnalysis task

Parameters:
  • counts_files (list) – counts files to include in the analysis

  • barcode_analysis_dir (str) – path to the directory to write the analysis results to

  • lanes (list) – optional list of lanes to restrict the analysis to

  • mismatches (int) – optional number of mismatches to allow when comparing barcodes

  • cutoff (float) – optional fraction of total barcodes below which indexes won’t be reported

  • sample_sheet (str) – optional, sample sheet to check barcode sequences against

  • title (str) – optional, title for output reports

Outputs:

report_file (str): path to the report file xls_file (str): path to the XLS report html_file (str): path to the HTML report

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.barcodes.pipeline.SetupBarcodeAnalysisDirs(_name, *args, **kws)

Set up the output directories

init(barcode_analysis_dir, counts_dir, force=False)

Initialise the SetupBarcodeAnalysisDirs task

Parameters:
  • barcode_analysis_dir (str) – final output directory

  • counts_dir (str) – directory to write counts files to

  • force (bool) – if True then remove existing counts files

setup()

Set up commands to be performed by the task

Must be implemented by the subclass