auto_process_ngs.qc.modules.sequence_lengths
Implements the ‘sequence_lengths’ QC module:
SequenceLengths: core QCModule class
GetSeqLengthStats: pipeline task to generate sequence length data
- class auto_process_ngs.qc.modules.sequence_lengths.GetSeqLengthStats(_name, *args, **kws)
Get data on sequence lengths, masking and padding for Fastqs in a project, and write the data to JSON files.
- finish()
Perform actions on task completion
Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.
Must be implemented by the subclass
- init(project, qc_dir, read_numbers=None, fastqs=None, fastq_attrs=None)
Initialise the GetSeqLengthStats task
- Parameters:
project (AnalysisProject) – project with Fastqs to get the sequence length data from
qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)
read_numbers (sequence) – list of read numbers to include (or None to include all reads)
fastqs (list) – optional, list of Fastq files (overrides Fastqs in project)
fastq_attrs (BaseFastqAttrs) – class to use for extracting data from Fastq names
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- class auto_process_ngs.qc.modules.sequence_lengths.SequenceLengths
Class for handling the ‘sequence_lengths’ QC module
- classmethod add_to_pipeline(p, project_name, project, qc_dir, read_numbers, fastqs, require_tasks=[], compute_runner=None)
Adds tasks for ‘sequence_lengths’ module to pipeline
- Parameters:
p (Pipeline) – pipeline to extend
project_name (str) – name of project
project (AnalysisProject) – project to run module on
qc_dir (str) – path to QC directory
read_numbers (list) – read numbers to include
fastqs (list) – Fastqs to run the module on
require_tasks (list) – list of tasks that the module needs to wait for
compute_runner (JobRunner) – runner to use for computation
- classmethod collect_qc_outputs(qc_dir)
Collect information on sequence length outputs
Returns an AttributeDictionary with the following attributes:
name: set to ‘sequence_lengths’
software: dictionary of software and versions
max_seqs: maximum number of sequences found in a single Fastq
min_seq_length: dictionary with minimum sequence lengths for each read
max_seq_length: dictionary with maximum sequence lengths for each read
reads: list of read IDs (e.g. ‘r1’, ‘i2’)
fastqs: list of associated Fastq names
output_files: list of associated output files
tags: list of associated output classes
- Parameters:
qc_dir (QCDir) – QC directory to examine
- classmethod verify(params, qc_outputs)
Verify ‘sequence_lengths’ QC module against outputs
Returns one of 3 values:
True: outputs verified ok
False: outputs failed to verify
None: verification not possible
- Parameters:
params (AttributeDictionary) – values of parameters used as inputs
qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method