`auto_process_ngs.commands.make_fastqs_cmd`

auto_process_ngs.commands.make_fastqs_cmd.make_fastqs(ap, protocol='standard', platform=None, unaligned_dir=None, sample_sheet=None, name=None, lanes=None, lane_subsets=None, icell8_well_list=None, nprocessors=None, bcl_converter=None, bases_mask=None, no_lane_splitting=None, minimum_trimmed_read_length=None, mask_short_adapter_reads=None, trim_adapters=True, adapter_sequence=None, adapter_sequence_read2=None, create_fastq_for_index_read=None, find_adapters_with_sliding_window=None, generate_stats=True, stats_file=None, per_lane_stats_file=None, analyse_barcodes=True, barcode_analysis_dir=None, force_copy_of_primary_data=False, create_empty_fastqs=False, runner=None, icell8_swap_i1_and_i2=False, icell8_reverse_complement=None, cellranger_jobmode=None, cellranger_mempercore=None, cellranger_maxjobs=None, cellranger_jobinterval=None, cellranger_localcores=None, cellranger_localmem=None, cellranger_ignore_dual_index=False, spaceranger_rc_i2_override=None, max_jobs=None, max_cores=None, batch_limit=None, verbose=False, working_dir=None)

Create and summarise FASTQ files

Wrapper for operations related to FASTQ file generation and analysis. The operations are typically:

get primary data (BCL files)
run bcl-to-fastq conversion
generate statistics
analyse barcodes

If the number of processors and the job runner are not explicitly specified then these are taken from the settings for the bcl2fastq and the statistics generation steps, which may differ from each other. However if either of these values are set explicitly then the same values will be used for both steps.

Parameters:

ap (AutoProcessor) – autoprocessor pointing to the analysis directory to create Fastqs for
protocol (str) – if set then specifies the protocol to use for fastq generation, otherwise use the ‘standard’ bcl2fastq protocol
platform (str) – if set then specifies the sequencing platform (otherwise platform will be determined from the primary data)
unaligned_dir (str) – if set then use this as the output directory for bcl-to-fastq conversion. Default is ‘bcl2fastq’ (unless an alternative is already specified in the config file)
sample_sheet (str) – if set then use this as the input samplesheet
name (str) – (optional) identifier for outputs that are not set explicitly
lanes (list) – (optional) specify a list of lane numbers to use in the processing; lanes not in the list will be excluded (default is to include all lanes)
lane_subsets (list) – (optional) specify a list of lane subsets to process separately before merging at the end; each subset is a dictionary which should be generated using the ‘subset’ function, and can include custom values for processing parameters (e.g. protocol, trimming and masking options etc) to override the defaults for this lane. Lanes not in a subset will still be processed unless excluded via the ‘lanes’ keyword
icell8_well_list (str) – well list file for ICELL8 platforms (required for ICELL8 processing protocols)
nprocessors (int) – number of processors to use
generate_stats (bool) – if True then (re)generate statistics file for fastqs
analyse_barcodes (bool) – if True then (re)analyse barcodes for fastqs
bcl_converter (str) – default BCL-to-Fastq conversion software to use; optionally can include a version specification (e.g. “bcl2fastq>2.0” or “bcl-convert=3.7.5”). Defaults to “bcl2fastq”
bases_mask (str) – if set then use this as an alternative bases mask setting
no_lane_splitting (bool) – if True then run bcl2fastq with –no-lane-splitting
minimum_trimmed_read_length (int) – if set then specify minimum length for reads after adapter trimming (shorter reads will be padded with Ns to make them long enough)
mask_short_adapter_reads (int) – if set then specify the minimum length of ACGT bases that must be present in a read after adapter trimming for it not to be masked completely with Ns.
trim_adapters (boolean) – if True (the default) then pass adapter sequence(s) to bcl2fastq to perform adapter trimming; otherwise remove adapter sequences
adapter_sequence (str) – if not None then specifies adapter sequence to use instead of any sequences already set in the samplesheet (nb will be ignored if ‘trim_adapters’ is False)
adapter_sequence_read2 (str) – if not None then specifies adapter sequence to use for read2 instead of any sequences already set in the samplesheet (nb will be ignored if ‘trim_adapters’ is False)
create_fastq_for_index_reads (boolean) – if True then also create Fastq files for index reads (default, don’t create index read Fastqs)
find_adapters_with_sliding_window (boolean) – if True then use sliding window algorithm to identify adapter sequences for trimming
stats_file (str) – if set then use this as the name of the output per-fastq stats file.
per_lane_stats_file (str) – if set then use this as the name of the output per-lane stats file.
barcode_analysis_dir (str) – if set then specifies path to the output directory for barcode analysis
force_copy_of_primary_data (bool) – if True then force primary data to be copied (rsync’ed) even if it’s on the local system (default is to link to primary data unless it’s on a remote filesystem).
create_empty_fastqs (bool) – if True then create empty ‘placeholder’ fastq files for any missing fastqs after bcl2fastq (must have completed with zero exit status)
runner (JobRunner) – (optional) specify a non-default job runner to use for fastq generation
icell8_swap_i1_and_i2 (bool) – if True then swap I1 and I2 reads when matching to barcodes in the ICELL8 well list (ICELL8 ATAC data only)
icell8_reverse_complement (str) – one of ‘i1’, ‘i2’, ‘both’, or None; if set then the specified index reads will be reverse complemented when matching to barcodes in the ICELL8 well list (ICELL8 ATAC data only)
cellranger_jobmode (str) – (optional) job mode to run cellranger in (10xGenomics Chromium SC data only)
cellranger_mempercore (int) – (optional) memory assumed per core (in Gbs) (10xGenomics Chromium SC data only)
cellranger_maxjobs (int) – (optional) maxiumum number of concurrent jobs to run (10xGenomics Chromium SC data only)
cellranger_jobinterval (int) – (optional) how often jobs are submitted (in ms) (10xGenomics Chromium SC data only)
cellranger_localcores (int) – (optional) maximum number of cores cellranger can request in jobmode ‘local’ (10xGenomics Chromium SC data only)
cellranger_localmem (int) – (optional) maximum memory cellranger can request in jobmode ‘local’ (10xGenomics Chromium SC data only)
cellranger_ignore_dual_index (bool) – (optional) on a dual-indexed flowcell where the second index was not used for the 10x sample, ignore it (10xGenomics Chromium SC data only)
spaceranger_rc_i2_override (bool) – (optional) if set then value is passed to Spaceranger’s ‘–rc-i2-override’ option (True for reverse complement workflow B, False for forward complement workflow A). If not set then Spaceranger will be left to determine the workflow automatically
max_jobs (int) – maximum number of concurrent jobs allowed
max_cores (int) – maximum number of cores available
batch_limit (int) – if set then run commands in each task in batches, with the batch size set dyanmically so as not to exceed this limit
working_dir (str) – path to a working directory (defaults to temporary directory in the current directory)
verbose (bool) – if True then report additional information for pipeline diagnostics

auto_process_ngs.commands.make_fastqs_cmd

`auto_process_ngs.commands.make_fastqs_cmd`