`auto_process_ngs.bcl2fastq.pipeline`

Pipeline components for generating Fastqs from Bcl files.

Pipeline classes:

MakeFastqs

Pipeline task classes:

FetchPrimaryData
MakeSampleSheet
GetBcl2Fastq
GetBclConvert
RestoreBackupDirectory
RunBcl2Fastq
GetBasesMaskIcell8
GetBasesMaskIcell8Atac
Get10xPackage
DemultiplexIcell8Atac
MergeFastqs
MergeFastqDirs
GetBasesMask10xMultiome
Run10xMkfastq
FastqStatistics
ReportProcessingQC

Utility functions:

subset

class auto_process_ngs.bcl2fastq.pipeline.DemultiplexIcell8Atac(_name, *args, **kws)

Runs ‘demultiplex_icell8_atac.py’ to generate Fastqs

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_dir, out_dir, well_list, nprocessors=None, swap_i1_and_i2=False, reverse_complement=None, skip_demultiplex=False)

Initialise the DemultiplexIcell8Atac task

Parameters:

fastq_dir (str) – path to directory with Fastq files to demultiplex
out_dir (str) – path to output directory
well_list (str) – path to well list file to use for demultiplexing samples
swap_i1_and_i2 (bool) – if True then swap the I1 and I2 indexes when demultiplexing
reverse_complement (str) – whether to reverse complement I1, I2, or both, when demultiplexing
skip_demultiplex (bool) – if True then skip running the demultiplexing

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.FastqStatistics(_name, *args, **kws)

Generates statistics for Fastq files

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(bcl2fastq_dir, sample_sheet, out_dir, stats_file=None, stats_full_file=None, per_lane_stats_file=None, per_lane_sample_stats_file=None, add_data=False, force=False, nprocessors=None)

Initialise the FastqStatistics task

Parameters:

bcl2fastq_dir (str) – path to directory with Fastqs from bcl2fastq
sample_sheet (str) – path to sample sheet file
out_dir (str) – path to directory to write the output stats files to
stats_file (str) – path to statistics output file
stats_full_file (str) – path to full statistics output file
per_lane_stats_file (str) – path to per-lane statistics output file
per_lane_sample_stats_file (str) – path to per-lane per-sample statistics output file
add_data (bool) – if True then add stats to the existing stats files (default is to overwrite existing stats files)
force (bool) – if True then force update of the stats files even if they are newer than the Fastq files (by default stats are only updated if they are older than the Fastqs)
nprocessors (int) – number of cores to use when running ‘fastq_statistics.py’

Outputs:: stats_file: path to basic stats file stats_full: path to full stats file per_lane_stats: path to per-lane stats file per_lane_sample_stats: path to per-lane sample

stats file

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.FetchPrimaryData(_name, *args, **kws)

Fetch the primary data for processing

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(data_dir, primary_data_dir, force_copy=False)

Initialise the FetchPrimaryData task

Parameters:

data_dir (str) – location of the source sequencing data
primary_data_dir (str) – directory to copy data to (if source is a remote location) or link data from (if source is on the local system)
force_copy (bool) – if True then force primary data to be copied even if it’s on the local system

Outputs:: run_dir: path to the local copy of the primary data

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.Get10xPackage(_name, *args, **kws)

Get information on 10xGenomics software package

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(require_package)

Initialise the Get10xPackage task

If no matching package is located then the outputs are all set to ‘None’.

Parameters:: require_package (str) – name of the 10xGenomics package executable that is required (e.g. ‘cellranger’, ‘cellranger-atac’)

Outputs:: package_name (str): name of the package package_exe (str): path to the package executable package_version (str): the package version package_info (tuple): tuple consisting of

(exe,package,version)

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.GetBasesMask10xMultiome(_name, *args, **kws)

Sets the bases mask string for 10x Genomics single cell multiome

init(run_dir, bases_mask, protocol)

Initialise the GetBasesMask10xMultiome task

Parameters:

run_dir (str) – path to the directory with data from the sequencer run
bases_mask (str) – input bases mask string (if set then will passed directly to output)
protocol (str) – protocol being used

Outputs:

bases_mask (str): bases mask to use in: CellRanger-ARC for processing these data

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.GetBasesMaskIcell8(_name, *args, **kws)

Set the bases mask for ICELL8 RNA-seq data

init(run_dir, sample_sheet)

Initialise the GetBasesMaskIcell8 task

Parameters:

run_dir (str) – path to the directory with data from the sequencer run
sample_sheet (str) – path to the sample sheet file to be used for processing these data

Outputs:

bases_mask (str): bases mask to use in: bcl2fastq for processing these data

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.GetBasesMaskIcell8Atac(_name, *args, **kws)

Set the bases mask for ICELL8 ATAC-seq data

init(run_dir)

Initialise the GetBasesMaskIcell8Atac task

Parameters:

run_dir (str) – path to the directory with data from the sequencer run
sample_sheet (str) – path to the sample sheet file to be used for processing these data

Outputs:

bases_mask (str): bases mask to use in: bcl2fastq for processing these data

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.GetBcl2Fastq(_name, *args, **kws)

Get information on the bcl2fastq executable

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(require_version=None)

Initialise the GetBcl2Fastq task

Parameters:: require_version (str) – if set then should be a string of the form ‘1.8.4’ or ‘>2.0’, explicitly specifying the version of bcl2fastq to use. If not set then no version check will be made

Outputs:: bcl2fastq_exe (str): path to the bcl2fastq executable bcl2fastq_package (str): name of the bcl2fastq package bcl2fastq_version (str): the bcl2fastq version bcl2fastq_info (tuple): tuple consisting of

(exe,package,version)

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.GetBclConvert(_name, *args, **kws)

Get information on the bcl-convert executable

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(require_version=None)

Initialise the GetBcl2Fastq task

Parameters:: require_version (str) – if set then should be a string of the form ‘1.8.4’ or ‘>2.0’, explicitly specifying the version of bcl-convert to use. If not set then no version check will be made

Outputs:: bclconvert_exe (str): path to the bcl-convert executable bclconvert_package (str): name of the bcl-convert package bclconvert_version (str): the bcl-convert version bclconvert_info (tuple): tuple consisting of

(exe,package,version)

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.IdentifyPlatform(_name, *args, **kws)

Identify the sequencer platform from the primary data

init(run_dir, platform=None)

Initialise the IdentifyPlatform task

Parameters:

run_dir (str) – path to the sequencer run data
platform (str) – optional, specify the platform

Outputs:: platform: sequencer platform flow_cell_mode: flow cell mode, if defined

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.MakeFastqs(run_dir, sample_sheet, protocol='standard', bases_mask='auto', bcl_converter='bcl2fastq', platform=None, icell8_well_list=None, minimum_trimmed_read_length=None, mask_short_adapter_reads=None, adapter_sequence=None, adapter_sequence_read2=None, spaceranger_rc_i2_override=None, icell8_atac_swap_i1_and_i2=None, icell8_atac_reverse_complement=None, r1_length=None, r2_length=None, lanes=None, trim_adapters=True, fastq_statistics=True, analyse_barcodes=True, lane_subsets=None)

Run the Fastq generation pipeline on one or more lane subsets

Pipeline to run Fastq generation on multiple projects.

Example usage for processing a standard run:

>>> make_fastqs = MakeFastqs(run_dir,sample_sheet)
>>> make_fastqs.run()

Example for splitting a run to use different protocols for different lanes:

>>> make_fastqs = MakeFastqs(run_dir,sample_sheet,
...                          lane_subsets=(
...                             subset(lanes=[1,2,3,4,5,6],
...                                    protocol="standard"),
...                             subset(lanes=[7,8],
...                                    protocol="10x_chromium_sc")))
>>> make_fastqs.run()

In this case subsets of lanes are defined by calling the ‘subset’ function; each subset is processed separately using the protocol specified for that subset, before being merged into a single output directory.

Parameters defined in the lane subsets override those defined globally in the pipleine.

On completion the pipeline makes the follow outputs availble:

platform: the platform assigned to the primary data
primary_data_dir: the directory containing the primary data
acquired_primary_data: boolean indicating if the primary
data exists
bcl2fastq_info: tuple with information on the bcl2fastq
software used
bclconvert_info: tuple with information on the BCL Convert
software used
cellranger_info: tuple with information on the cellranger
software used
stats_file: path to the statistics file
stats_full: path to the full statistics file
per_lane_stats: path to the per-lane statistics file
per_lane_sample_stats: path to the per-lane per-sample
statistics file
missing_fastqs: list of Fastq files that bcl2fastq failed
to generate

run(analysis_dir, out_dir=None, barcode_analysis_dir=None, primary_data_dir=None, force_copy_of_primary_data=False, no_lane_splitting=None, create_fastq_for_index_read=None, find_adapters_with_sliding_window=None, create_empty_fastqs=None, name=None, stats_file=None, stats_full=None, per_lane_stats=None, per_lane_sample_stats=None, nprocessors=None, cellranger_jobmode='local', cellranger_mempercore=None, cellranger_maxjobs=None, cellranger_jobinterval=None, cellranger_localcores=None, cellranger_localmem=None, working_dir=None, log_dir=None, log_file=None, batch_size=None, batch_limit=None, max_jobs=1, max_slots=None, poll_interval=5, runners=None, default_runner=None, envmodules=None, verbose=False)

Run the tasks in the pipeline

Parameters:

analysis_dir (str) – directory to perform the processing and analyses in
out_dir (str) – (sub)directory for output from Fastq generation (defaults to ‘bcl2fastq’)
barcode_analysis_dir (str) – (sub)directory for barcode analysis (defaults to ‘barcode_analysis’)
primary_data_dir (str) – top-level directory holding the primary data
force_copy_of_primary_data (bool) – if True then force primary data to be copied (rsync’ed) even if it’s on the local system (default is to link to primary data unless it’s on a remote filesystem)
no_lane_splitting (bool) – if True then don’t split output Fastqs across lanes (–no-lane-splitting)
create_fastq_for_index_read (bool) – if True then also output Fastqs for the index (I1 etc) reads (–create-fastq-for-index-read)
find_adapters_with_sliding_window (bool) – if True then use sliding window algorith to identify adapter sequences (–find-adapters-with-sliding-window)
create_empty_fastqs (bool) – if True then create empty “placeholder” Fastqs if not created by bcl2fastq
name (str) – optional identifier for output stats and report files
stats_file (str) – path to statistics output file
stats_full (str) – path to full statistics output file
per_lane_stats (str) – path to per-lane statistics output file
per_lane_sample_stats (str) – path to per-lane per-sample statistics output file
nprocessors (int) – number of threads to use for multithreaded applications (default is to take number of CPUs set in job runners)
cellranger_jobmode (str) – job mode to run cellranger in
cellranger_mempercore (int) – memory assumed per core
cellranger_maxjobs (int) – maxiumum number of concurrent jobs to run
cellranger_jobinterval (int) – how often jobs are submitted (in ms)
cellranger_localcores (int) – maximum number of cores cellranger can request in jobmode ‘local’
cellranger_localmem (int) – (optional) maximum memory cellranger can request in jobmode ‘local’
working_dir (str) – optional path to a working directory (defaults to temporary directory in the current directory)
log_dir (str) – path of directory where log files will be written to
batch_size (int) – if set then run commands in each task in batches, with each batch running this many commands at a time (default is to run one command per job)
batch_limit (int) – if set then run commands in each task in batches, with the batch size set dyanmically so as not to exceed this limit (default is to use fixed batch sizes)
max_jobs (int) – optional maximum number of concurrent jobs in scheduler (defaults to 1)
max_slots (int) – optional maximum number of ‘slots’ (i.e. concurrent threads or maximum number of CPUs) available to the scheduler (defaults to no limit)
poll_interval (float) – optional polling interval (seconds) to set in scheduler (defaults to 5s)
runners (dict) – mapping of names to JobRunner instances; valid names are ‘rsync_runner, ‘bcl2fastq_runner’, ‘bclconvert_runner’, ‘barcode_analysis_runner’, ‘merge_fastqs_runner’, ‘demultiplex_icell8_atac_runner’, ‘cellranger_runner’, ‘cellranger_atac_runner’, ‘cellranger_arc_runner’, ‘spaceranger_runner’, ‘default’
envmodules (mapping) – mapping of names to environment module file lists; valid names are ‘bcl2fastq’,’cellranger_mkfastq’, ‘cellranger_atac_mkfastq’
default_runner (JobRunner) – optional default job runner to use
verbose (bool) – if True then report additional information for diagnostics

property subsets: Return list of lane subsets defined in pipeline

class auto_process_ngs.bcl2fastq.pipeline.MakeSampleSheet(_name, *args, **kws)

Creates a custom sample sheet

init(sample_sheet_file, lanes=(), adapter=None, adapter_read2=None)

Initialise the MakeSampleSheet task

Parameters:

sample_sheet_file (str) – name and path of the base sample file to generate the new file from
lanes (list) – (optional) list of lane numbers to keep in the output sample sheet; if empty then all lanes will be kept
adapter (str) – (optional) if set then write to the Adapter setting
adapter_read2 (str) – (optional) if set then write to the AdapterRead2 setting

Outputs:

custom_sample_sheet (PipelineParam): pipeline: parameter instance that resolves to a string with the path to the output sample sheet file.

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.MergeFastqDirs(_name, *args, **kws)

Merges directories with subsets of Fastqs

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_dirs, merged_fastq_dir)

Initialise the MergeFastqDirs task

Parameters:

fastq_dirs (list) – set of directories with Fastqs in bcl2fastq-like structure, to merge together
merged_fastq_dir (str) – path to the output directory where all the Fastqs will be put together

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.MergeFastqs(_name, *args, **kws)

Merges Fastqs across multiple lanes

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_dirs, out_dir, sample_sheet=None, no_lane_splitting=False, create_empty_fastqs=False, skip_merge=False)

Initialise the MergeFastqs task

Parameters:

fastq_dirs (list) – set of directories with Fastqs in bcl2fastq-like structure, to merge together
out_dir (str) – path to output directory
sample_sheet (str) – optional sample sheet file to verify the merged files against
no_lane_splitting (bool) – if True then merge Fastqs across lanes
create_empty_fastqs (bool) – if True then create empty placeholder Fastq files for any that are missing on successful completion of Fastq merging
skip_merge (bool) – if True then skip running the merging step within the task

Outputs:

missing_fastqs: list of Fastqs missing after: Fastq merging

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.ReportProcessingQC(_name, *args, **kws)

Generate HTML report on the processing QC

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(name, analysis_dir, stats_file, per_lane_stats_file, per_lane_sample_stats_file, report_html)

Initialise the ReportProcessingQC task

Parameters:

name (str) – identifier for report title
analysis_dir (str) – directory with the statistics files
stats_file (str) – path to full statistics file
per_lane_stats_file (str) – path to the per-lane statistics file
per_lane_sample_stats_file (str) – path to the per-lane per-sample statistics file
report_html (str) – path to the output HTML QC report

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.RestoreBackupDirectory(_name, *args, **kws)

Check for and restore saved copy of directory

Looks for a backup version of a directory, and restores it by renaming it back to the original name if found.

Back up for directory /path/to/dir will be called /path/to/save.dir.

init(dirn, skip_restore=False)

Initialise the RestoreBackupDirectory task

Parameters:

dirn (str) – path to the original directory to look for backup of
skip_restore (bool) – if True then check for the backup but don’t restore it if found

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.Run10xMkfastq(_name, *args, **kws)

Runs 10xGenomics ‘mkfastq’ to generate Fastqs

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(run_dir, out_dir, sample_sheet, bases_mask='auto', minimum_trimmed_read_length=None, mask_short_adapter_reads=None, filter_single_index=None, filter_dual_index=None, rc_i2_override=None, jobmode='local', maxjobs=None, mempercore=None, jobinterval=None, localcores=None, localmem=None, create_empty_fastqs=False, platform=None, pkg_exe=None, pkg_version=None, bcl2fastq_exe=None, bcl2fastq_version=None, skip_mkfastq=False)

Initialise the Run10xMkfastq task

Parameters:

run_dir (str) – path to the directory with data from the sequencer run
out_dir (str) – output directory for cellranger
sample_sheet (str) – path to input samplesheet file
bases_mask (str) – if set then use this as an alternative bases mask setting
minimum_trimmed_read_length (int) – if set then supply to cellranger via –minimum-trimmed-read-length
mask_short_adapter_reads (int) – if set then supply to cellranger via –mask-short-adapter-reads
filter_single_index (bool) – for cellranger[-arc], only demultiplex samples identified by an i7-only sample index, ignoring dual-indexed samples (which will not be demultiplexed) (i.e. use –filter-single-index option)
filter_dual_index (bool) – for cellranger[-arc], only demultiplex samples identified by i7/i5 dual-indices (e.g., SI-TT-A6), ignoring single-index samples (which will not be demultiplexed) (i.e. use –filter-dual-index option)
rc_i2_override (bool) – for spaceranger, set the value of the –rc-i2-override option (default is not to pass this option to spaceranger)
jobmode (str) – jobmode to use for running cellranger
maxjobs (int) – maximum number of concurrent jobs for 10xGenomics mkfastq to run
mempercore (int) – amount of memory available per core (for jobmode other than ‘local’)
jobinterval (int) – time to pause inbetween starting 10xGenomics mkfastq jobs
localcores (int) – number of cores available to 10xGenomics mkfastq in jobmode ‘local’
localmem (int) – amount of memory available to 10xGenomics mkfastq in jobmode ‘local’
create_empty_fastqs (bool) – if True then create empty placeholder Fastq files for any that are missing on successful completion of 10xGenomics mkfastq
platform (str) – optional, sequencing platform that generated the data
pkg_exe (str) – the path to the 10xGenomics software package to use (e.g. ‘cellranger’, ‘cellranger-atac’, ‘spaceranger’)
pkg_version (str) – the version string for the 10xGenomics package
bcl2fastq_exe (str) – the path to the bcl2fastq executable to use
bcl2fastq_version (str) – the version string for the bcl2fastq package
skip_mkfastq (bool) – if True then skip running the ‘mkfastq’ step within the task

Outputs:

missing_fastqs: list of Fastqs missing after: Fastq generation

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.RunBcl2Fastq(_name, *args, **kws)

Run bcl2fastq to generate Fastqs from sequencing data

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(run_dir, out_dir, sample_sheet, bases_mask='auto', r1_length=None, r2_length=None, ignore_missing_bcl=False, no_lane_splitting=False, minimum_trimmed_read_length=None, mask_short_adapter_reads=None, create_fastq_for_index_read=False, find_adapters_with_sliding_window=False, nprocessors=None, create_empty_fastqs=False, platform=None, bcl2fastq_exe=None, bcl2fastq_version=None, skip_bcl2fastq=False)

Initialise the RunBcl2Fastq task

Parameters:

run_dir (str) – path to the source sequencing data
out_dir (str) – output directory for bcl2fastq
sample_sheet (str) – path to input samplesheet file
bases_mask (str) – if set then use this as an alternative bases mask setting
r1_length (int) – if set then truncate R1 reads in bases mask to this length (NB ignored if bases mask is already set)
r2_length (int) – if set then truncate R2 reads in bases mask to this length (NB ignored if bases mask is already set)
ignore_missing_bcl (bool) – if True then run bcl2fastq with –ignore-missing-bcl
no_lane_splitting (bool) – if True then run bcl2fastq with –no-lane-splitting
minimum_trimmed_read_length (int) – if set then supply to bcl2fastq via –minimum-trimmed-read-length
mask_short_adapter_reads (int) – if set then supply to bcl2fastq via –mask-short-adapter-reads
create_fastq_for_index_read (boolean) – if True then also create Fastq files for index reads (default, don’t create index read Fastqs)
find_adapters_with_sliding_window (bool) – if True then use sliding window algorith for identifying adapter sequences (default is to use string matching algorithm)
nprocessors (int) – number of processors to use (taken from job runner by default)
create_empty_fastqs (bool) – if True then create empty placeholder Fastq files for any that are missing on successful completion of bcl2fastq
platform (str) – optional, sequencing platform that generated the data
bcl2fastq_exe (str) – the path to the bcl2fastq executable to use
bcl2fastq_version (str) – the version string for the bcl2fastq package
skip_bcl2fastq (bool) – if True then sets the output parameters but finishes before actually running bcl2fastq

Outputs:: bases_mask: actual bases mask used mismatches: number of mismatches allowed missing_fastqs: list of Fastqs missing after

Fastq generation

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.bcl2fastq.pipeline.RunBclConvert(_name, *args, **kws)

Run BCL Convert to generate Fastqs from sequencing data

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(run_dir, out_dir, sample_sheet, lane=None, bases_mask='auto', r1_length=None, r2_length=None, ignore_missing_bcl=False, no_lane_splitting=False, minimum_trimmed_read_length=None, mask_short_adapter_reads=None, create_fastq_for_index_read=False, nprocessors=None, create_empty_fastqs=False, ignore_missing_fastqs=False, platform=None, bclconvert_exe=None, bclconvert_version=None, skip_bclconvert=False)

Initialise the RunBclConvert task

Parameters:

run_dir (str) – path to the source sequencing data
out_dir (str) – output directory for bcl2fastq
sample_sheet (str) – path to input samplesheet file
lane (int) – optional, run bcl-convert on a single lane with –bcl-only-lane
bases_mask (str) – if set then use this as an alternative bases mask setting
r1_length (int) – if set then truncate R1 reads in bases mask to this length (NB ignored if bases mask is already set)
r2_length (int) – if set then truncate R2 reads in bases mask to this length (NB ignored if bases mask is already set)
no_lane_splitting (bool) – if True then run bcl-convert with –no-lane-splitting
minimum_trimmed_read_length (int) – if set then supply to bcl-convert via sample sheet settings
mask_short_adapter_reads (int) – if set then supply to bcl-convert via sample sheet settings
create_fastq_for_index_read (boolean) – if True then also create Fastq files for index reads (default, don’t create index read Fastqs)
nprocessors (int) – number of processors to use (taken from job runner by default)
create_empty_fastqs (bool) – if True then create empty placeholder Fastq files for any that are missing on successful completion of bcl-convert
ignore_missing_fastqs (bool) – if True then ignore missing Fastqs on successful completion of bcl-convert
platform (str) – optional, sequencing platform that generated the data
bclconvert_exe (str) – the path to the bcl-convert executable to use
bclconvert_version (str) – the version string for the bcl-convert package
skip_bclconvert (bool) – if True then sets the output parameters but finishes before actually running bcl-convert

Outputs:: bases_mask: actual bases mask used mismatches: number of mismatches allowed missing_fastqs: list of Fastqs missing after

Fastq generation

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

auto_process_ngs.bcl2fastq.pipeline.create_placeholder_fastqs(fastqs, base_dir=None)

Create empty ‘placeholder’ Fastq files

Parameters:

fastqs (list) – paths to Fastq file names to create
base_dir (str) – if supplied then used as the base directory; Fastqs will be created relative to this dir

auto_process_ngs.bcl2fastq.pipeline.subset(lanes, **kws)

Create a dictionary representing a set of lanes

Returns a dictionary which holds information about a set of lanes grouped together for processing, along with values of parameters that should be used for this set of lanes.

Keys must be one of the parameter names listed in the LANE_SET_ATTRIBUTES constant; specifying an unrecognised key will result in a KeyError exception.

Parameters:

lanes (list) – lanes that comprise the set
kws (mapping) – set of key-value pairs assigning values to parameters for the group of lanes

Raises:

KeyError – if a supplied key is not a valid attribute.

auto_process_ngs.bcl2fastq.pipeline.verify_run(fastq_dir, sample_sheet)

Verify Fastq dir contents against sample sheet

Check the contents of a Bcl-to-Fastq output directory against a sample sheet, and return a list of missing Fastqs (or an empty list if all expected Fastqs are present).

Parameters:

fastq_dir (str) – path to Bcl-to-Fastq output directory
sample_sheet (str) – path to sample sheet file

Returns:

list of missing Fastqs, or an empty list if: all expected Fastqs are present.

Return type:

List

auto_process_ngs.bcl2fastq.pipeline

`auto_process_ngs.bcl2fastq.pipeline`