auto_process_ngs.commands.make_fastqs_cmd

auto_process_ngs.commands.make_fastqs_cmd.make_fastqs(ap, protocol='standard', platform=None, unaligned_dir=None, sample_sheet=None, name=None, lanes=None, lane_subsets=None, icell8_well_list=None, nprocessors=None, bcl_converter=None, bases_mask=None, no_lane_splitting=None, minimum_trimmed_read_length=None, mask_short_adapter_reads=None, trim_adapters=True, adapter_sequence=None, adapter_sequence_read2=None, create_fastq_for_index_read=None, find_adapters_with_sliding_window=None, generate_stats=True, stats_file=None, per_lane_stats_file=None, analyse_barcodes=True, barcode_analysis_dir=None, force_copy_of_primary_data=False, create_empty_fastqs=False, runner=None, icell8_swap_i1_and_i2=False, icell8_reverse_complement=None, cellranger_jobmode=None, cellranger_mempercore=None, cellranger_maxjobs=None, cellranger_jobinterval=None, cellranger_localcores=None, cellranger_localmem=None, cellranger_ignore_dual_index=False, spaceranger_rc_i2_override=None, max_jobs=None, max_cores=None, batch_limit=None, verbose=False, working_dir=None)

Create and summarise FASTQ files

Wrapper for operations related to FASTQ file generation and analysis. The operations are typically:

  • get primary data (BCL files)

  • run bcl-to-fastq conversion

  • generate statistics

  • analyse barcodes

If the number of processors and the job runner are not explicitly specified then these are taken from the settings for the bcl2fastq and the statistics generation steps, which may differ from each other. However if either of these values are set explicitly then the same values will be used for both steps.

Parameters:
  • ap (AutoProcessor) – autoprocessor pointing to the analysis directory to create Fastqs for

  • protocol (str) – if set then specifies the protocol to use for fastq generation, otherwise use the ‘standard’ bcl2fastq protocol

  • platform (str) – if set then specifies the sequencing platform (otherwise platform will be determined from the primary data)

  • unaligned_dir (str) – if set then use this as the output directory for bcl-to-fastq conversion. Default is ‘bcl2fastq’ (unless an alternative is already specified in the config file)

  • sample_sheet (str) – if set then use this as the input samplesheet

  • name (str) – (optional) identifier for outputs that are not set explicitly

  • lanes (list) – (optional) specify a list of lane numbers to use in the processing; lanes not in the list will be excluded (default is to include all lanes)

  • lane_subsets (list) – (optional) specify a list of lane subsets to process separately before merging at the end; each subset is a dictionary which should be generated using the ‘subset’ function, and can include custom values for processing parameters (e.g. protocol, trimming and masking options etc) to override the defaults for this lane. Lanes not in a subset will still be processed unless excluded via the ‘lanes’ keyword

  • icell8_well_list (str) – well list file for ICELL8 platforms (required for ICELL8 processing protocols)

  • nprocessors (int) – number of processors to use

  • generate_stats (bool) – if True then (re)generate statistics file for fastqs

  • analyse_barcodes (bool) – if True then (re)analyse barcodes for fastqs

  • bcl_converter (str) – default BCL-to-Fastq conversion software to use; optionally can include a version specification (e.g. “bcl2fastq>2.0” or “bcl-convert=3.7.5”). Defaults to “bcl2fastq”

  • bases_mask (str) – if set then use this as an alternative bases mask setting

  • no_lane_splitting (bool) – if True then run bcl2fastq with –no-lane-splitting

  • minimum_trimmed_read_length (int) – if set then specify minimum length for reads after adapter trimming (shorter reads will be padded with Ns to make them long enough)

  • mask_short_adapter_reads (int) – if set then specify the minimum length of ACGT bases that must be present in a read after adapter trimming for it not to be masked completely with Ns.

  • trim_adapters (boolean) – if True (the default) then pass adapter sequence(s) to bcl2fastq to perform adapter trimming; otherwise remove adapter sequences

  • adapter_sequence (str) – if not None then specifies adapter sequence to use instead of any sequences already set in the samplesheet (nb will be ignored if ‘trim_adapters’ is False)

  • adapter_sequence_read2 (str) – if not None then specifies adapter sequence to use for read2 instead of any sequences already set in the samplesheet (nb will be ignored if ‘trim_adapters’ is False)

  • create_fastq_for_index_reads (boolean) – if True then also create Fastq files for index reads (default, don’t create index read Fastqs)

  • find_adapters_with_sliding_window (boolean) – if True then use sliding window algorithm to identify adapter sequences for trimming

  • stats_file (str) – if set then use this as the name of the output per-fastq stats file.

  • per_lane_stats_file (str) – if set then use this as the name of the output per-lane stats file.

  • barcode_analysis_dir (str) – if set then specifies path to the output directory for barcode analysis

  • force_copy_of_primary_data (bool) – if True then force primary data to be copied (rsync’ed) even if it’s on the local system (default is to link to primary data unless it’s on a remote filesystem).

  • create_empty_fastqs (bool) – if True then create empty ‘placeholder’ fastq files for any missing fastqs after bcl2fastq (must have completed with zero exit status)

  • runner (JobRunner) – (optional) specify a non-default job runner to use for fastq generation

  • icell8_swap_i1_and_i2 (bool) – if True then swap I1 and I2 reads when matching to barcodes in the ICELL8 well list (ICELL8 ATAC data only)

  • icell8_reverse_complement (str) – one of ‘i1’, ‘i2’, ‘both’, or None; if set then the specified index reads will be reverse complemented when matching to barcodes in the ICELL8 well list (ICELL8 ATAC data only)

  • cellranger_jobmode (str) – (optional) job mode to run cellranger in (10xGenomics Chromium SC data only)

  • cellranger_mempercore (int) – (optional) memory assumed per core (in Gbs) (10xGenomics Chromium SC data only)

  • cellranger_maxjobs (int) – (optional) maxiumum number of concurrent jobs to run (10xGenomics Chromium SC data only)

  • cellranger_jobinterval (int) – (optional) how often jobs are submitted (in ms) (10xGenomics Chromium SC data only)

  • cellranger_localcores (int) – (optional) maximum number of cores cellranger can request in jobmode ‘local’ (10xGenomics Chromium SC data only)

  • cellranger_localmem (int) – (optional) maximum memory cellranger can request in jobmode ‘local’ (10xGenomics Chromium SC data only)

  • cellranger_ignore_dual_index (bool) – (optional) on a dual-indexed flowcell where the second index was not used for the 10x sample, ignore it (10xGenomics Chromium SC data only)

  • spaceranger_rc_i2_override (bool) – (optional) if set then value is passed to Spaceranger’s ‘–rc-i2-override’ option (True for reverse complement workflow B, False for forward complement workflow A). If not set then Spaceranger will be left to determine the workflow automatically

  • max_jobs (int) – maximum number of concurrent jobs allowed

  • max_cores (int) – maximum number of cores available

  • batch_limit (int) – if set then run commands in each task in batches, with the batch size set dyanmically so as not to exceed this limit

  • working_dir (str) – path to a working directory (defaults to temporary directory in the current directory)

  • verbose (bool) – if True then report additional information for pipeline diagnostics