auto_process_ngs.qc.modules.fastqc
Implements the ‘fastqc’ QC module:
Fastqc: core QCModule class
CheckFastqcOutputs: pipeline task to check outputs
RunFastqc: pipeline task to run Fastqc
check_fastqc_outputs: helper function for checking outputs
- class auto_process_ngs.qc.modules.fastqc.CheckFastQCOutputs(_name, *args, **kws)
Check the outputs from FastQC
- finish()
Perform actions on task completion
Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.
Must be implemented by the subclass
- init(project, qc_dir, read_numbers, fastqs=None, verbose=False)
Initialise the CheckFastQCOutputs task.
- Parameters:
project (AnalysisProject) – project to run QC for
qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)
read_numbers (list) – list of read numbers to include
fastqs (list) – optional, list of Fastq files (overrides Fastqs in project)
verbose (bool) – if True then print additional information from the task
- Outputs:
- fastqs (list): list of Fastqs that have
missing FastQC outputs under the specified QC protocol
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- class auto_process_ngs.qc.modules.fastqc.Fastqc
Class for handling the ‘fastqc’ QC module
- classmethod add_to_pipeline(p, project_name, project, qc_dir, read_numbers, fastqs, nthreads=None, require_tasks=[], verify_runner=None, compute_runner=None, envmodules=None, verbose=True)
Adds tasks for ‘fastqc’ module to pipeline
- Parameters:
p (Pipeline) – pipeline to extend
project_name (str) – name of project
project (AnalysisProject) – project to run module on
qc_dir (str) – path to QC directory
read_numbers (list) – read numbers to include
fastqs (list) – Fastqs to run the module on
nthreads (int) – number of threads (if not set then will be taken from the runner)
require_tasks (list) – list of tasks that the module needs to wait for
verify_runner (JobRunner) – runner to use for checks
compute_runner (JobRunner) – runner to use for computation
envmodules (list) – environment module names to load for running FastQC
verbose (bool) – enable verbose output
- classmethod collect_qc_outputs(qc_dir)
Collect information on FastQC outputs
Returns an AttributeDictionary with the following attributes:
name: set to ‘fastqc’
software: dictionary of software and versions
fastqs: list of associated Fastq names
output_files: list of associated output files
tags: list of associated output classes
- Parameters:
qc_dir (QCDir) – QC directory to examine
- classmethod verify(params, qc_outputs)
Verify ‘fastqc’ QC module against outputs
Returns one of 3 values:
True: outputs verified ok
False: outputs failed to verify
None: verification not possible
- Parameters:
params (AttributeDictionary) – values of parameters used as inputs
qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method
- class auto_process_ngs.qc.modules.fastqc.RunFastQC(_name, *args, **kws)
Run FastQC
- init(fastqs, qc_dir, nthreads=None)
Initialise the RunIlluminaQC task.
- Parameters:
fastqs (list) – list of paths to Fastq files to run Fastq Screen on (it is expected that this list will come from the CheckIlluminaQCOutputs task)
qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)
nthreads (int) – number of threads/processors to use (defaults to number of slots set in runner)
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- auto_process_ngs.qc.modules.fastqc.check_fastqc_outputs(project, qc_dir, fastqs=None, read_numbers=None)
Return Fastqs missing QC outputs from FastQC
Returns a list of the Fastqs from a project for which one or more associated outputs from FastQC don’t exist in the specified QC directory.
- Parameters:
project (AnalysisProject) – project to check the QC outputs for
qc_dir (str) – path to the QC directory (relative path is assumed to be a subdirectory of the project)
fastqs (list) – optional list of Fastqs to check against (defaults to Fastqs from the project)
read_numbers (list) – read numbers to predict outputs for
- Returns:
list of Fastq files with missing outputs.
- Return type: