auto_process_ngs.qc.modules.strandedness
Implements the ‘strandedness’ QC module:
Strandedness: core QCModule class
SetupFastqStrandConf: pipeline task to set up conf file
CheckFastqStrandOutputs: pipeline task to check outputs
RunFastqStrand: pipeline task to run ‘fastq_strand’
check_fastq_strand_outputs: helper function for checking output files
- class auto_process_ngs.qc.modules.strandedness.CheckFastqStrandOutputs(_name, *args, **kws)
Check the outputs from the fastq_strand.py utility
- finish()
Perform actions on task completion
Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.
Must be implemented by the subclass
- init(project, qc_dir, fastq_strand_conf, fastqs=None, read_numbers=None, include_samples=None, verbose=False)
Initialise the CheckFastqStrandOutputs task.
- Parameters:
project (AnalysisProject) – project to run QC for
qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)
fastq_strand_conf (str) – path to the fastq_strand config file
fastqs (list) – explicit list of Fastq files to check against (default is to use Fastqs from supplied analysis project)
read_numbers (list) – list of read numbers to include when checking outputs
include_samples (list) – optional, list of sample names to include
verbose (bool) – if True then print additional information from the task
- Outputs:
- fastq_pairs (list): list of tuples with Fastq
“pairs” that have missing outputs from fastq_strand.py under the specified QC protocol. A “pair” may be an (R1,R2) tuple, or a single Fastq (e.g. (fq,)).
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- class auto_process_ngs.qc.modules.strandedness.RunFastqStrand(_name, *args, **kws)
Run the fastq_strand.py utility
- init(fastq_pairs, qc_dir, fastq_strand_conf, fastq_strand_subset=None, nthreads=None)
Initialise the RunFastqStrand task.
- Parameters:
fastq_pairs (list) – list of tuples with “pairs” of Fastq files to run fastq_strand.py on (it is expected that this list will come from the CheckFastqStrandOutputs task)
qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)
fastq_strand_conf (str) – path to the fastq_strand config file to use
fastq_strand_subset (int) – explicitly specify the subset size for running fastq_strand
nthreads (int) – number of threads/processors to use (defaults to number of slots set in job runner)
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- class auto_process_ngs.qc.modules.strandedness.SetupFastqStrandConf(_name, *args, **kws)
Set up a fastq_strand.conf file
- finish()
Perform actions on task completion
Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.
Must be implemented by the subclass
- init(project, qc_dir=None, organism=None, star_indexes=None)
Initialise the SetupFastqStrandConf task.
- Parameters:
project (AnalysisProject) – project to run QC for
qc_dir (str) – if supplied then points to directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)
organism (str) – if supplied then must be a string with the names of one or more organisms, with multiple organisms separated by spaces (defaults to the organisms associated with the project)
star_indexes (dict) – dictionary mapping normalised organism names to STAR indexes
- Outputs:
- fastq_strand_conf (PipelineParam): pipeline
parameter instance that resolves to a string with the path to the generated config file.
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- class auto_process_ngs.qc.modules.strandedness.Strandedness
Class for handling the ‘strandedness’ QC module
- classmethod add_to_pipeline(p, project_name, project, qc_dir, organism, read_numbers, fastqs, star_indexes, include_samples=None, nthreads=None, fastq_subset=None, require_tasks=[], verify_runner=None, compute_runner=None, envmodules=None, verbose=False)
Adds tasks for ‘strandedness’ module to pipeline
- Parameters:
p (Pipeline) – pipeline to extend
project_name (str) – name of project
project (AnalysisProject) – project to run module on
qc_dir (str) – path to QC directory
organism (str) – name of organism(s)
read_numbers (list) – read numbers to include
fastqs (list) – Fastqs to run the module on
star_indexes (mapping) – associated STAR indexes
include_samples (list) – subset of sample names to include
fastq_subset (int) – subset of reads to use for FastqScreen
nthreads (int) – number of threads (if not set then will be taken from the runner)
require_tasks (list) – list of tasks that the module needs to wait for
verify_runner (JobRunner) – runner to use for checks
compute_runner (JobRunner) – runner to use for computation
verbose (bool) – enable verbose output
- classmethod collect_qc_outputs(qc_dir)
Collect information on strandedness outputs
Returns an AttributeDictionary with the following attributes:
name: set to ‘strandedness’
software: dictionary of software and versions
fastqs: list of associated Fastq names
config_files: list of associated config files (‘fastq_strand.conf’)
output_files: list of associated output files
tags: list of associated output classes
- Parameters:
qc_dir (QCDir) – QC directory to examine
- classmethod verify(params, qc_outputs)
Verify ‘strandedness’ QC module against outputs
Returns one of 3 values:
True: outputs verified ok
False: outputs failed to verify
None: verification not possible
- Parameters:
params (AttributeDictionary) – values of parameters used as inputs
qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method
- auto_process_ngs.qc.modules.strandedness.check_fastq_strand_outputs(project, qc_dir, fastq_strand_conf, fastqs=None, read_numbers=None)
Return Fastqs missing QC outputs from fastq_strand.py
Returns a list of the Fastqs from a project for which one or more associated outputs from fastq_strand.py don’t exist in the specified QC directory.
- Parameters:
project (AnalysisProject) – project to check the QC outputs for
qc_dir (str) – path to the QC directory (relative path is assumed to be a subdirectory of the project)
fastq_strand_conf (str) – path to a fastq_strand config file; strandedness QC outputs will be included unless the path is None or the config file doesn’t exist. Relative path is assumed to be a subdirectory of the project
fastqs (list) – optional list of Fastqs to check against (defaults to Fastqs from the project)
read_numbers (list) – read numbers to predict outputs for
- Returns:
- list of Fastq file “pairs” with missing
outputs; pairs are (R1,R2) tuples, with ‘R2’ missing if only one Fastq is used for the strandedness determination.
- Return type: