auto_process_ngs.barcodes.pipeline
Pipeline components for analysing barcodes.
Pipeline classes:
AnalyseBarcodes
Pipeline task classes:
SetupBarcodeAnalysisDirs
CountBarcodes
ListBarcodeCountFiles
DetermineMismatches
ReportBarcodeAnalysis
- class auto_process_ngs.barcodes.pipeline.AnalyseBarcodes(bcl2fastq_dir=None, sample_sheet=None)
Analyse the barcodes for Fastqs in a sequencing run
Pipeline to perform barcode analysis on the Fastqs produced by bcl2fastq from a sequencing run.
- run(barcode_analysis_dir, bcl2fastq_dir=None, title=None, lanes=None, mismatches=None, bases_mask=None, cutoff=None, sample_sheet=None, force=False, working_dir=None, log_file=None, batch_size=None, max_jobs=1, poll_interval=5, runners=None, default_runner=None, verbose=False)
Run the tasks in the pipeline
- Parameters:
barcode_analysis_dir (str) – path to the directory to write the analysis results to
bcl2fastq_dir (str) – path to the bcl2fastq outputs (must be supplied here if a bcl2fastq output directory was not supplied on pipeline creation)
title (str) – optional, title for output reports
lanes (list) – optional, list of lanes to restrict the analysis to
mismatches (int) – optional, explicitly specify the number of mismatches to allow (default: determine number of mismatches automatically)
bases_mask (str) – optional, bases mask used for Fastq generation and demultiplexing
cutoff (float) – optional, don’t report barcodes with a fraction of associated reads below this value (e.g. ‘0.001’ excludes barcodes with < 0.1% of reads) (default: don’t apply a cutoff)
sample_sheet (str) – optional, sample sheet to check barcode sequences against
force (bool) – if True then force regeneration of counts (default: re-use existing counts)
working_dir (str) – optional path to a working directory (defaults: temporary directory in the current directory)
log_file (str) – path to write log file to (default: don’t write a log file)
batch_size (int) – if set then run commands in each task in batches, with each batch running this many commands at a time (default: run one command per job)
max_jobs (int) – optional maximum number of concurrent jobs in scheduler (default: 1)
poll_interval (float) – optional polling interval (seconds) to set in scheduler (default: 5s)
runners (dict) – mapping of names to JobRunner instances; valid names are ‘barcode_analysis_runner’ and ‘default’
default_runner (JobRunner) – optional default job runner to use
verbose (bool) – if True then report additional information for diagnostics
- class auto_process_ngs.barcodes.pipeline.CountBarcodes(_name, *args, **kws)
Generate barcode counts for a project
- init(illumina_data, project, counts_dir, lanes=None, use_project_name=None)
Initialise the CountBarcodes task
- Parameters:
illumina_data (IlluminaData) – IlluminaData object for the Fastq data that should be examined
project (str) – name of project with Fastqs to get barcode codes from
counts_dir (str) – directory to write counts files to
lanes (list) – optional list of lanes to restrict counts generation to
use_project_name (str) – optional alternative name for project to use in counts file names (default is to use the name from the supplied project)
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- class auto_process_ngs.barcodes.pipeline.ListBarcodeCountFiles(_name, *args, **kws)
Collect counts files from a directory
- init(counts_dir)
Initialise the ListBarcodeCountFiles task
- Parameters:
counts_dir (str) – directory holding the counts files
- Outputs:
- counts_files (list): list of counts
files
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- class auto_process_ngs.barcodes.pipeline.LoadIlluminaData(_name, *args, **kws)
Load up an IlluminaData object from the bcl2fastq outputs
- init(bcl2fastq_dir)
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- class auto_process_ngs.barcodes.pipeline.ReportBarcodeAnalysis(_name, *args, **kws)
Perform analysis and reporting of barcode counts
- finish()
Perform actions on task completion
Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.
Must be implemented by the subclass
- init(counts_files, barcode_analysis_dir, lanes=None, mismatches=None, cutoff=0.001, sample_sheet=None, title=None)
Initialise the ReportBarcodeAnalysis task
- Parameters:
counts_files (list) – counts files to include in the analysis
barcode_analysis_dir (str) – path to the directory to write the analysis results to
lanes (list) – optional list of lanes to restrict the analysis to
mismatches (int) – optional number of mismatches to allow when comparing barcodes
cutoff (float) – optional fraction of total barcodes below which indexes won’t be reported
sample_sheet (str) – optional, sample sheet to check barcode sequences against
title (str) – optional, title for output reports
- Outputs:
report_file (str): path to the report file xls_file (str): path to the XLS report html_file (str): path to the HTML report
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- class auto_process_ngs.barcodes.pipeline.SetupBarcodeAnalysisDirs(_name, *args, **kws)
Set up the output directories
- init(barcode_analysis_dir, counts_dir, force=False)
Initialise the SetupBarcodeAnalysisDirs task
- Parameters:
barcode_analysis_dir (str) – final output directory
counts_dir (str) – directory to write counts files to
force (bool) – if True then remove existing counts files
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass