auto_process_ngs.qc.modules.fastq_screen
Implements the ‘fastq_screen’ QC module:
FastqScreen: core QCModule class
CheckFastqScreenOutputs: pipeline task to check outputs
RunFastqScreen: pipeline task to run FastqScreen
check_fastq_screen_outputs: helper function for checking outputs
- class auto_process_ngs.qc.modules.fastq_screen.CheckFastqScreenOutputs(_name, *args, **kws)
Check the outputs from FastqScreen
- finish()
Perform actions on task completion
Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.
Must be implemented by the subclass
- init(project, qc_dir, screens, fastqs=None, read_numbers=None, include_samples=None, fastq_attrs=None, legacy=False, verbose=False)
Initialise the CheckFastqScreenOutputs task.
- Parameters:
project (AnalysisProject) – project to run QC for
qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)
screens (mapping) – mapping of screen names to FastqScreen conf files
fastqs (list) – explicit list of Fastq files to check against (default is to use Fastqs from supplied analysis project)
read_numbers (list) – read numbers to include
include_samples (list) – optional, list of sample names to include
fastq_attrs (BaseFastqAttrs) – class to use for extracting data from Fastq names
legacy (bool) – if True then use ‘legacy’ naming convention for output files (default is to use new format)
verbose (bool) – if True then print additional information from the task
- Outputs:
- fastqs (list): list of Fastqs that have
missing FastqScreen outputs under the specified QC protocol
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- class auto_process_ngs.qc.modules.fastq_screen.FastqScreen
Class for handling the ‘fastq_screen’ QC module
- classmethod add_to_pipeline(p, project_name, project, qc_dir, screens, fastqs, read_numbers, include_samples=None, nthreads=None, fastq_subset=None, legacy=False, requires_tasks=[], verify_runner=None, compute_runner=None, envmodules=None, verbose=False)
Adds tasks for ‘fastq_screen’ module to pipeline
- Parameters:
p (Pipeline) – pipeline to extend
project_name (str) – name of project
project (AnalysisProject) – project to run module on
qc_dir (str) – path to QC directory
screens (list) – list of screen names
fastqs (list) – Fastqs to run the module on
read_numbers (list) – read numbers to include
include_samples (list) – subset of sample names to include
nthreads (int) – number of threads (if not set then will be taken from the runner)
fastq_subset (int) – subset of reads to use for FastqScreen
legacy (bool) – whether to use legacy naming for output files
verbose (bool) – enable verbose output
require_tasks (list) – list of tasks that the module needs to wait for
verify_runner (JobRunner) – runner to use for checks
compute_runner (JobRunner) – runner to use for computation
- classmethod collect_qc_outputs(qc_dir)
Collect information on FastqScreen outputs
Returns an AttributeDictionary with the following attributes:
name: set to ‘fastq_screen’
software: dictionary of software and versions
screen_names: list of associated panel names
fastqs: list of associated Fastq names
- fastqs_for_screen: dictionary of panel names and lists
of Fastq names associated with each panel
output_files: list of associated output files
tags: list of associated output classes
- Parameters:
qc_dir (QCDir) – QC directory to examine
- classmethod verify(params, qc_outputs)
Verify ‘fastq_screen’ QC module against outputs
Returns one of 3 values:
True: outputs verified ok
False: outputs failed to verify
None: verification not possible
- Parameters:
params (AttributeDictionary) – values of parameters used as inputs
qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method
- class auto_process_ngs.qc.modules.fastq_screen.RunFastqScreen(_name, *args, **kws)
Run FastqScreen
- init(fastqs, qc_dir, screens, subset=None, nthreads=None, read_numbers=None, fastq_attrs=None, legacy=False)
Initialise the RunFastqScreen task.
- Parameters:
fastqs (list) – list of paths to Fastq files to run Fastq Screen on (it is expected that this list will come from the CheckIlluminaQCOutputs task)
qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)
screens (mapping) – mapping of screen names to FastqScreen conf files
subset (int) – explicitly specify the subset size for running Fastq_screen
nthreads (int) – number of threads/processors to use (defaults to number of slots set in runner)
read_numbers (list) – list of read numbers to include when running Fastq Screen
fastq_attrs (BaseFastqAttrs) – class to use for extracting data from Fastq names
legacy (bool) – if True then use ‘legacy’ naming convention for output files (default is to use new format)
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- auto_process_ngs.qc.modules.fastq_screen.check_fastq_screen_outputs(project, qc_dir, screen, fastqs=None, read_numbers=None, legacy=False)
Return Fastqs missing QC outputs from FastqScreen
Returns a list of the Fastqs from a project for which one or more associated outputs from FastqScreen don’t exist in the specified QC directory.
- Parameters:
project (AnalysisProject) – project to check the QC outputs for
qc_dir (str) – path to the QC directory (relative path is assumed to be a subdirectory of the project)
screen (str) – screen name to check
fastqs (list) – optional list of Fastqs to check against (defaults to Fastqs from the project)
read_numbers (list) – read numbers to define Fastqs to predict outputs for; if not set then all non-index reads will be included
legacy (bool) – if True then check for ‘legacy’-style names (defult: False)
- Returns:
list of Fastq files with missing outputs.
- Return type: