auto_process_ngs.bcl2fastq.pipeline

Pipeline components for generating Fastqs from Bcl files.

Pipeline classes:

  • MakeFastqs

Pipeline task classes:

  • FetchPrimaryData

  • MakeSampleSheet

  • GetBcl2Fastq

  • GetBclConvert

  • RestoreBackupDirectory

  • RunBcl2Fastq

  • GetBasesMaskIcell8

  • GetBasesMaskIcell8Atac

  • Get10xPackage

  • DemultiplexIcell8Atac

  • MergeFastqs

  • MergeFastqDirs

  • GetBasesMask10xMultiome

  • Run10xMkfastq

  • FastqStatistics

  • ReportProcessingQC

Utility functions:

  • subset

class auto_process_ngs.bcl2fastq.pipeline.DemultiplexIcell8Atac(_name, *args, **kws)

Runs ‘demultiplex_icell8_atac.py’ to generate Fastqs

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_dir, out_dir, well_list, nprocessors=None, swap_i1_and_i2=False, reverse_complement=None, skip_demultiplex=False)

Initialise the DemultiplexIcell8Atac task

Parameters:
  • fastq_dir (str) – path to directory with Fastq files to demultiplex

  • out_dir (str) – path to output directory

  • well_list (str) – path to well list file to use for demultiplexing samples

  • swap_i1_and_i2 (bool) – if True then swap the I1 and I2 indexes when demultiplexing

  • reverse_complement (str) – whether to reverse complement I1, I2, or both, when demultiplexing

  • skip_demultiplex (bool) – if True then skip running the demultiplexing

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.FastqStatistics(_name, *args, **kws)

Generates statistics for Fastq files

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(bcl2fastq_dir, sample_sheet, out_dir, stats_file=None, stats_full_file=None, per_lane_stats_file=None, per_lane_sample_stats_file=None, add_data=False, force=False, nprocessors=None)

Initialise the FastqStatistics task

Parameters:
  • bcl2fastq_dir (str) – path to directory with Fastqs from bcl2fastq

  • sample_sheet (str) – path to sample sheet file

  • out_dir (str) – path to directory to write the output stats files to

  • stats_file (str) – path to statistics output file

  • stats_full_file (str) – path to full statistics output file

  • per_lane_stats_file (str) – path to per-lane statistics output file

  • per_lane_sample_stats_file (str) – path to per-lane per-sample statistics output file

  • add_data (bool) – if True then add stats to the existing stats files (default is to overwrite existing stats files)

  • force (bool) – if True then force update of the stats files even if they are newer than the Fastq files (by default stats are only updated if they are older than the Fastqs)

  • nprocessors (int) – number of cores to use when running ‘fastq_statistics.py’

Outputs:

stats_file: path to basic stats file stats_full: path to full stats file per_lane_stats: path to per-lane stats file per_lane_sample_stats: path to per-lane sample

stats file

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.FetchPrimaryData(_name, *args, **kws)

Fetch the primary data for processing

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(data_dir, primary_data_dir, force_copy=False)

Initialise the FetchPrimaryData task

Parameters:
  • data_dir (str) – location of the source sequencing data

  • primary_data_dir (str) – directory to copy data to (if source is a remote location) or link data from (if source is on the local system)

  • force_copy (bool) – if True then force primary data to be copied even if it’s on the local system

Outputs:

run_dir: path to the local copy of the primary data

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.Get10xPackage(_name, *args, **kws)

Get information on 10xGenomics software package

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(require_package)

Initialise the Get10xPackage task

If no matching package is located then the outputs are all set to ‘None’.

Parameters:

require_package (str) – name of the 10xGenomics package executable that is required (e.g. ‘cellranger’, ‘cellranger-atac’)

Outputs:

package_name (str): name of the package package_exe (str): path to the package executable package_version (str): the package version package_info (tuple): tuple consisting of

(exe,package,version)

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.GetBasesMask10xMultiome(_name, *args, **kws)

Sets the bases mask string for 10x Genomics single cell multiome

init(run_dir, bases_mask, protocol)

Initialise the GetBasesMask10xMultiome task

Parameters:
  • run_dir (str) – path to the directory with data from the sequencer run

  • bases_mask (str) – input bases mask string (if set then will passed directly to output)

  • protocol (str) – protocol being used

Outputs:
bases_mask (str): bases mask to use in

CellRanger-ARC for processing these data

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.GetBasesMaskIcell8(_name, *args, **kws)

Set the bases mask for ICELL8 RNA-seq data

init(run_dir, sample_sheet)

Initialise the GetBasesMaskIcell8 task

Parameters:
  • run_dir (str) – path to the directory with data from the sequencer run

  • sample_sheet (str) – path to the sample sheet file to be used for processing these data

Outputs:
bases_mask (str): bases mask to use in

bcl2fastq for processing these data

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.GetBasesMaskIcell8Atac(_name, *args, **kws)

Set the bases mask for ICELL8 ATAC-seq data

init(run_dir)

Initialise the GetBasesMaskIcell8Atac task

Parameters:
  • run_dir (str) – path to the directory with data from the sequencer run

  • sample_sheet (str) – path to the sample sheet file to be used for processing these data

Outputs:
bases_mask (str): bases mask to use in

bcl2fastq for processing these data

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.GetBcl2Fastq(_name, *args, **kws)

Get information on the bcl2fastq executable

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(require_version=None)

Initialise the GetBcl2Fastq task

Parameters:

require_version (str) – if set then should be a string of the form ‘1.8.4’ or ‘>2.0’, explicitly specifying the version of bcl2fastq to use. If not set then no version check will be made

Outputs:

bcl2fastq_exe (str): path to the bcl2fastq executable bcl2fastq_package (str): name of the bcl2fastq package bcl2fastq_version (str): the bcl2fastq version bcl2fastq_info (tuple): tuple consisting of

(exe,package,version)

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.GetBclConvert(_name, *args, **kws)

Get information on the bcl-convert executable

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(require_version=None)

Initialise the GetBcl2Fastq task

Parameters:

require_version (str) – if set then should be a string of the form ‘1.8.4’ or ‘>2.0’, explicitly specifying the version of bcl-convert to use. If not set then no version check will be made

Outputs:

bclconvert_exe (str): path to the bcl-convert executable bclconvert_package (str): name of the bcl-convert package bclconvert_version (str): the bcl-convert version bclconvert_info (tuple): tuple consisting of

(exe,package,version)

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.IdentifyPlatform(_name, *args, **kws)

Identify the sequencer platform from the primary data

init(run_dir, platform=None)

Initialise the IdentifyPlatform task

Parameters:
  • run_dir (str) – path to the sequencer run data

  • platform (str) – optional, specify the platform

Outputs:

platform: sequencer platform flow_cell_mode: flow cell mode, if defined

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.MakeFastqs(run_dir, sample_sheet, protocol='standard', bases_mask='auto', bcl_converter='bcl2fastq', platform=None, icell8_well_list=None, minimum_trimmed_read_length=None, mask_short_adapter_reads=None, adapter_sequence=None, adapter_sequence_read2=None, spaceranger_rc_i2_override=None, icell8_atac_swap_i1_and_i2=None, icell8_atac_reverse_complement=None, lanes=None, trim_adapters=True, fastq_statistics=True, analyse_barcodes=True, lane_subsets=None)

Run the Fastq generation pipeline on one or more lane subsets

Pipeline to run Fastq generation on multiple projects.

Example usage for processing a standard run:

>>> make_fastqs = MakeFastqs(run_dir,sample_sheet)
>>> make_fastqs.run()

Example for splitting a run to use different protocols for different lanes:

>>> make_fastqs = MakeFastqs(run_dir,sample_sheet,
...                          lane_subsets=(
...                             subset(lanes=[1,2,3,4,5,6],
...                                    protocol="standard"),
...                             subset(lanes=[7,8],
...                                    protocol="10x_chromium_sc")))
>>> make_fastqs.run()

In this case subsets of lanes are defined by calling the ‘subset’ function; each subset is processed separately using the protocol specified for that subset, before being merged into a single output directory.

Parameters defined in the lane subsets override those defined globally in the pipleine.

On completion the pipeline makes the follow outputs availble:

  • platform: the platform assigned to the primary data

  • primary_data_dir: the directory containing the primary data

  • acquired_primary_data: boolean indicating if the primary

    data exists

  • bcl2fastq_info: tuple with information on the bcl2fastq

    software used

  • bclconvert_info: tuple with information on the BCL Convert

    software used

  • cellranger_info: tuple with information on the cellranger

    software used

  • stats_file: path to the statistics file

  • stats_full: path to the full statistics file

  • per_lane_stats: path to the per-lane statistics file

  • per_lane_sample_stats: path to the per-lane per-sample

    statistics file

  • missing_fastqs: list of Fastq files that bcl2fastq failed

    to generate

run(analysis_dir, out_dir=None, barcode_analysis_dir=None, primary_data_dir=None, force_copy_of_primary_data=False, no_lane_splitting=None, create_fastq_for_index_read=None, find_adapters_with_sliding_window=None, create_empty_fastqs=None, name=None, stats_file=None, stats_full=None, per_lane_stats=None, per_lane_sample_stats=None, nprocessors=None, cellranger_jobmode='local', cellranger_mempercore=None, cellranger_maxjobs=None, cellranger_jobinterval=None, cellranger_localcores=None, cellranger_localmem=None, working_dir=None, log_dir=None, log_file=None, batch_size=None, batch_limit=None, max_jobs=1, max_slots=None, poll_interval=5, runners=None, default_runner=None, envmodules=None, verbose=False)

Run the tasks in the pipeline

Parameters:
  • analysis_dir (str) – directory to perform the processing and analyses in

  • out_dir (str) – (sub)directory for output from Fastq generation (defaults to ‘bcl2fastq’)

  • barcode_analysis_dir (str) – (sub)directory for barcode analysis (defaults to ‘barcode_analysis’)

  • primary_data_dir (str) – top-level directory holding the primary data

  • force_copy_of_primary_data (bool) – if True then force primary data to be copied (rsync’ed) even if it’s on the local system (default is to link to primary data unless it’s on a remote filesystem)

  • no_lane_splitting (bool) – if True then don’t split output Fastqs across lanes (–no-lane-splitting)

  • create_fastq_for_index_read (bool) – if True then also output Fastqs for the index (I1 etc) reads (–create-fastq-for-index-read)

  • find_adapters_with_sliding_window (bool) – if True then use sliding window algorith to identify adapter sequences (–find-adapters-with-sliding-window)

  • create_empty_fastqs (bool) – if True then create empty “placeholder” Fastqs if not created by bcl2fastq

  • name (str) – optional identifier for output stats and report files

  • stats_file (str) – path to statistics output file

  • stats_full (str) – path to full statistics output file

  • per_lane_stats (str) – path to per-lane statistics output file

  • per_lane_sample_stats (str) – path to per-lane per-sample statistics output file

  • nprocessors (int) – number of threads to use for multithreaded applications (default is to take number of CPUs set in job runners)

  • cellranger_jobmode (str) – job mode to run cellranger in

  • cellranger_mempercore (int) – memory assumed per core

  • cellranger_maxjobs (int) – maxiumum number of concurrent jobs to run

  • cellranger_jobinterval (int) – how often jobs are submitted (in ms)

  • cellranger_localcores (int) – maximum number of cores cellranger can request in jobmode ‘local’

  • cellranger_localmem (int) – (optional) maximum memory cellranger can request in jobmode ‘local’

  • working_dir (str) – optional path to a working directory (defaults to temporary directory in the current directory)

  • log_dir (str) – path of directory where log files will be written to

  • batch_size (int) – if set then run commands in each task in batches, with each batch running this many commands at a time (default is to run one command per job)

  • batch_limit (int) – if set then run commands in each task in batches, with the batch size set dyanmically so as not to exceed this limit (default is to use fixed batch sizes)

  • max_jobs (int) – optional maximum number of concurrent jobs in scheduler (defaults to 1)

  • max_slots (int) – optional maximum number of ‘slots’ (i.e. concurrent threads or maximum number of CPUs) available to the scheduler (defaults to no limit)

  • poll_interval (float) – optional polling interval (seconds) to set in scheduler (defaults to 5s)

  • runners (dict) – mapping of names to JobRunner instances; valid names are ‘rsync_runner, ‘bcl2fastq_runner’, ‘bclconvert_runner’, ‘barcode_analysis_runner’, ‘merge_fastqs_runner’, ‘demultiplex_icell8_atac_runner’, ‘cellranger_runner’, ‘cellranger_atac_runner’, ‘cellranger_arc_runner’, ‘spaceranger_runner’, ‘default’

  • envmodules (mapping) – mapping of names to environment module file lists; valid names are ‘bcl2fastq’,’cellranger_mkfastq’, ‘cellranger_atac_mkfastq’

  • default_runner (JobRunner) – optional default job runner to use

  • verbose (bool) – if True then report additional information for diagnostics

property subsets

Return list of lane subsets defined in pipeline

class auto_process_ngs.bcl2fastq.pipeline.MakeSampleSheet(_name, *args, **kws)

Creates a custom sample sheet

init(sample_sheet_file, lanes=(), adapter=None, adapter_read2=None)

Initialise the MakeSampleSheet task

Parameters:
  • sample_sheet_file (str) – name and path of the base sample file to generate the new file from

  • lanes (list) – (optional) list of lane numbers to keep in the output sample sheet; if empty then all lanes will be kept

  • adapter (str) – (optional) if set then write to the Adapter setting

  • adapter_read2 (str) – (optional) if set then write to the AdapterRead2 setting

Outputs:
custom_sample_sheet (PipelineParam): pipeline

parameter instance that resolves to a string with the path to the output sample sheet file.

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.MergeFastqDirs(_name, *args, **kws)

Merges directories with subsets of Fastqs

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_dirs, merged_fastq_dir)

Initialise the MergeFastqDirs task

Parameters:
  • fastq_dirs (list) – set of directories with Fastqs in bcl2fastq-like structure, to merge together

  • merged_fastq_dir (str) – path to the output directory where all the Fastqs will be put together

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.MergeFastqs(_name, *args, **kws)

Merges Fastqs across multiple lanes

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_dirs, out_dir, sample_sheet=None, no_lane_splitting=False, create_empty_fastqs=False, skip_merge=False)

Initialise the MergeFastqs task

Parameters:
  • fastq_dirs (list) – set of directories with Fastqs in bcl2fastq-like structure, to merge together

  • out_dir (str) – path to output directory

  • sample_sheet (str) – optional sample sheet file to verify the merged files against

  • no_lane_splitting (bool) – if True then merge Fastqs across lanes

  • create_empty_fastqs (bool) – if True then create empty placeholder Fastq files for any that are missing on successful completion of Fastq merging

  • skip_merge (bool) – if True then skip running the merging step within the task

Outputs:
missing_fastqs: list of Fastqs missing after

Fastq merging

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.ReportProcessingQC(_name, *args, **kws)

Generate HTML report on the processing QC

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(name, analysis_dir, stats_file, per_lane_stats_file, per_lane_sample_stats_file, report_html)

Initialise the ReportProcessingQC task

Parameters:
  • name (str) – identifier for report title

  • analysis_dir (str) – directory with the statistics files

  • stats_file (str) – path to full statistics file

  • per_lane_stats_file (str) – path to the per-lane statistics file

  • per_lane_sample_stats_file (str) – path to the per-lane per-sample statistics file

  • report_html (str) – path to the output HTML QC report

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.RestoreBackupDirectory(_name, *args, **kws)

Check for and restore saved copy of directory

Looks for a backup version of a directory, and restores it by renaming it back to the original name if found.

Back up for directory /path/to/dir will be called /path/to/save.dir.

init(dirn, skip_restore=False)

Initialise the RestoreBackupDirectory task

Parameters:
  • dirn (str) – path to the original directory to look for backup of

  • skip_restore (bool) – if True then check for the backup but don’t restore it if found

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.Run10xMkfastq(_name, *args, **kws)

Runs 10xGenomics ‘mkfastq’ to generate Fastqs

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(run_dir, out_dir, sample_sheet, bases_mask='auto', minimum_trimmed_read_length=None, mask_short_adapter_reads=None, filter_single_index=None, filter_dual_index=None, rc_i2_override=None, jobmode='local', maxjobs=None, mempercore=None, jobinterval=None, localcores=None, localmem=None, create_empty_fastqs=False, platform=None, pkg_exe=None, pkg_version=None, bcl2fastq_exe=None, bcl2fastq_version=None, skip_mkfastq=False)

Initialise the Run10xMkfastq task

Parameters:
  • run_dir (str) – path to the directory with data from the sequencer run

  • out_dir (str) – output directory for cellranger

  • sample_sheet (str) – path to input samplesheet file

  • bases_mask (str) – if set then use this as an alternative bases mask setting

  • minimum_trimmed_read_length (int) – if set then supply to cellranger via –minimum-trimmed-read-length

  • mask_short_adapter_reads (int) – if set then supply to cellranger via –mask-short-adapter-reads

  • filter_single_index (bool) – for cellranger[-arc], only demultiplex samples identified by an i7-only sample index, ignoring dual-indexed samples (which will not be demultiplexed) (i.e. use –filter-single-index option)

  • filter_dual_index (bool) – for cellranger[-arc], only demultiplex samples identified by i7/i5 dual-indices (e.g., SI-TT-A6), ignoring single-index samples (which will not be demultiplexed) (i.e. use –filter-dual-index option)

  • rc_i2_override (bool) – for spaceranger, set the value of the –rc-i2-override option (default is not to pass this option to spaceranger)

  • jobmode (str) – jobmode to use for running cellranger

  • maxjobs (int) – maximum number of concurrent jobs for 10xGenomics mkfastq to run

  • mempercore (int) – amount of memory available per core (for jobmode other than ‘local’)

  • jobinterval (int) – time to pause inbetween starting 10xGenomics mkfastq jobs

  • localcores (int) – number of cores available to 10xGenomics mkfastq in jobmode ‘local’

  • localmem (int) – amount of memory available to 10xGenomics mkfastq in jobmode ‘local’

  • create_empty_fastqs (bool) – if True then create empty placeholder Fastq files for any that are missing on successful completion of 10xGenomics mkfastq

  • platform (str) – optional, sequencing platform that generated the data

  • pkg_exe (str) – the path to the 10xGenomics software package to use (e.g. ‘cellranger’, ‘cellranger-atac’, ‘spaceranger’)

  • pkg_version (str) – the version string for the 10xGenomics package

  • bcl2fastq_exe (str) – the path to the bcl2fastq executable to use

  • bcl2fastq_version (str) – the version string for the bcl2fastq package

  • skip_mkfastq (bool) – if True then skip running the ‘mkfastq’ step within the task

Outputs:
missing_fastqs: list of Fastqs missing after

Fastq generation

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.RunBcl2Fastq(_name, *args, **kws)

Run bcl2fastq to generate Fastqs from sequencing data

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(run_dir, out_dir, sample_sheet, bases_mask='auto', ignore_missing_bcl=False, no_lane_splitting=False, minimum_trimmed_read_length=None, mask_short_adapter_reads=None, create_fastq_for_index_read=False, find_adapters_with_sliding_window=False, nprocessors=None, create_empty_fastqs=False, platform=None, bcl2fastq_exe=None, bcl2fastq_version=None, skip_bcl2fastq=False)

Initialise the RunBcl2Fastq task

Parameters:
  • run_dir (str) – path to the source sequencing data

  • out_dir (str) – output directory for bcl2fastq

  • sample_sheet (str) – path to input samplesheet file

  • bases_mask (str) – if set then use this as an alternative bases mask setting

  • ignore_missing_bcl (bool) – if True then run bcl2fastq with –ignore-missing-bcl

  • no_lane_splitting (bool) – if True then run bcl2fastq with –no-lane-splitting

  • minimum_trimmed_read_length (int) – if set then supply to bcl2fastq via –minimum-trimmed-read-length

  • mask_short_adapter_reads (int) – if set then supply to bcl2fastq via –mask-short-adapter-reads

  • create_fastq_for_index_read (boolean) – if True then also create Fastq files for index reads (default, don’t create index read Fastqs)

  • find_adapters_with_sliding_window (bool) – if True then use sliding window algorith for identifying adapter sequences (default is to use string matching algorithm)

  • nprocessors (int) – number of processors to use (taken from job runner by default)

  • create_empty_fastqs (bool) – if True then create empty placeholder Fastq files for any that are missing on successful completion of bcl2fastq

  • platform (str) – optional, sequencing platform that generated the data

  • bcl2fastq_exe (str) – the path to the bcl2fastq executable to use

  • bcl2fastq_version (str) – the version string for the bcl2fastq package

  • skip_bcl2fastq (bool) – if True then sets the output parameters but finishes before actually running bcl2fastq

Outputs:

bases_mask: actual bases mask used mismatches: number of mismatches allowed missing_fastqs: list of Fastqs missing after

Fastq generation

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.RunBclConvert(_name, *args, **kws)

Run BCL Convert to generate Fastqs from sequencing data

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(run_dir, out_dir, sample_sheet, lane=None, bases_mask='auto', ignore_missing_bcl=False, no_lane_splitting=False, minimum_trimmed_read_length=None, mask_short_adapter_reads=None, create_fastq_for_index_read=False, nprocessors=None, create_empty_fastqs=False, ignore_missing_fastqs=False, platform=None, bclconvert_exe=None, bclconvert_version=None, skip_bclconvert=False)

Initialise the RunBclConvert task

Parameters:
  • run_dir (str) – path to the source sequencing data

  • out_dir (str) – output directory for bcl2fastq

  • sample_sheet (str) – path to input samplesheet file

  • lane (int) – optional, run bcl-convert on a single lane with –bcl-only-lane

  • bases_mask (str) – if set then use this as an alternative bases mask setting

  • no_lane_splitting (bool) – if True then run bcl-convert with –no-lane-splitting

  • minimum_trimmed_read_length (int) – if set then supply to bcl-convert via sample sheet settings

  • mask_short_adapter_reads (int) – if set then supply to bcl-convert via sample sheet settings

  • create_fastq_for_index_read (boolean) – if True then also create Fastq files for index reads (default, don’t create index read Fastqs)

  • nprocessors (int) – number of processors to use (taken from job runner by default)

  • create_empty_fastqs (bool) – if True then create empty placeholder Fastq files for any that are missing on successful completion of bcl-convert

  • ignore_missing_fastqs (bool) – if True then ignore missing Fastqs on successful completion of bcl-convert

  • platform (str) – optional, sequencing platform that generated the data

  • bclconvert_exe (str) – the path to the bcl-convert executable to use

  • bclconvert_version (str) – the version string for the bcl-convert package

  • skip_bclconvert (bool) – if True then sets the output parameters but finishes before actually running bcl-convert

Outputs:

bases_mask: actual bases mask used mismatches: number of mismatches allowed missing_fastqs: list of Fastqs missing after

Fastq generation

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

auto_process_ngs.bcl2fastq.pipeline.create_placeholder_fastqs(fastqs, base_dir=None)

Create empty ‘placeholder’ Fastq files

Parameters:
  • fastqs (list) – paths to Fastq file names to create

  • base_dir (str) – if supplied then used as the base directory; Fastqs will be created relative to this dir

auto_process_ngs.bcl2fastq.pipeline.subset(lanes, **kws)

Create a dictionary representing a set of lanes

Returns a dictionary which holds information about a set of lanes grouped together for processing, along with values of parameters that should be used for this set of lanes.

Keys must be one of the parameter names listed in the LANE_SET_ATTRIBUTES constant; specifying an unrecognised key will result in a KeyError exception.

Parameters:
  • lanes (list) – lanes that comprise the set

  • kws (mapping) – set of key-value pairs assigning values to parameters for the group of lanes

Raises:

KeyError – if a supplied key is not a valid attribute.

auto_process_ngs.bcl2fastq.pipeline.verify_run(fastq_dir, sample_sheet)

Verify Fastq dir contents against sample sheet

Check the contents of a Bcl-to-Fastq output directory against a sample sheet, and return a list of missing Fastqs (or an empty list if all expected Fastqs are present).

Parameters:
  • fastq_dir (str) – path to Bcl-to-Fastq output directory

  • sample_sheet (str) – path to sample sheet file

Returns:

list of missing Fastqs, or an empty list if

all expected Fastqs are present.

Return type:

List