auto_process_ngs.qc.modules.sequence_lengths

Implements the ‘sequence_lengths’ QC module:

  • SequenceLengths: core QCModule class

  • GetSeqLengthStats: pipeline task to generate sequence length data

class auto_process_ngs.qc.modules.sequence_lengths.GetSeqLengthStats(_name, *args, **kws)

Get data on sequence lengths, masking and padding for Fastqs in a project, and write the data to JSON files.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(project, qc_dir, read_numbers=None, fastqs=None, fastq_attrs=None)

Initialise the GetSeqLengthStats task

Parameters:
  • project (AnalysisProject) – project with Fastqs to get the sequence length data from

  • qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)

  • read_numbers (sequence) – list of read numbers to include (or None to include all reads)

  • fastqs (list) – optional, list of Fastq files (overrides Fastqs in project)

  • fastq_attrs (BaseFastqAttrs) – class to use for extracting data from Fastq names

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.qc.modules.sequence_lengths.SequenceLengths

Class for handling the ‘sequence_lengths’ QC module

classmethod add_to_pipeline(p, project_name, project, qc_dir, read_numbers, fastqs, require_tasks=[], compute_runner=None)

Adds tasks for ‘sequence_lengths’ module to pipeline

Parameters:
  • p (Pipeline) – pipeline to extend

  • project_name (str) – name of project

  • project (AnalysisProject) – project to run module on

  • qc_dir (str) – path to QC directory

  • read_numbers (list) – read numbers to include

  • fastqs (list) – Fastqs to run the module on

  • require_tasks (list) – list of tasks that the module needs to wait for

  • compute_runner (JobRunner) – runner to use for computation

classmethod collect_qc_outputs(qc_dir)

Collect information on sequence length outputs

Returns an AttributeDictionary with the following attributes:

  • name: set to ‘sequence_lengths’

  • software: dictionary of software and versions

  • max_seqs: maximum number of sequences found in a single Fastq

  • min_seq_length: dictionary with minimum sequence lengths for each read

  • max_seq_length: dictionary with maximum sequence lengths for each read

  • reads: list of read IDs (e.g. ‘r1’, ‘i2’)

  • fastqs: list of associated Fastq names

  • output_files: list of associated output files

  • tags: list of associated output classes

Parameters:

qc_dir (QCDir) – QC directory to examine

classmethod verify(params, qc_outputs)

Verify ‘sequence_lengths’ QC module against outputs

Returns one of 3 values:

  • True: outputs verified ok

  • False: outputs failed to verify

  • None: verification not possible

Parameters:
  • params (AttributeDictionary) – values of parameters used as inputs

  • qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method