auto_process_ngs.qc.modules.strandedness

Implements the ‘strandedness’ QC module:

  • Strandedness: core QCModule class

  • SetupFastqStrandConf: pipeline task to set up conf file

  • CheckFastqStrandOutputs: pipeline task to check outputs

  • RunFastqStrand: pipeline task to run ‘fastq_strand’

  • check_fastq_strand_outputs: helper function for checking output files

class auto_process_ngs.qc.modules.strandedness.CheckFastqStrandOutputs(_name, *args, **kws)

Check the outputs from the fastq_strand.py utility

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(project, qc_dir, fastq_strand_conf, fastqs=None, read_numbers=None, include_samples=None, verbose=False)

Initialise the CheckFastqStrandOutputs task.

Parameters:
  • project (AnalysisProject) – project to run QC for

  • qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)

  • fastq_strand_conf (str) – path to the fastq_strand config file

  • fastqs (list) – explicit list of Fastq files to check against (default is to use Fastqs from supplied analysis project)

  • read_numbers (list) – list of read numbers to include when checking outputs

  • include_samples (list) – optional, list of sample names to include

  • verbose (bool) – if True then print additional information from the task

Outputs:
fastq_pairs (list): list of tuples with Fastq

“pairs” that have missing outputs from fastq_strand.py under the specified QC protocol. A “pair” may be an (R1,R2) tuple, or a single Fastq (e.g. (fq,)).

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.qc.modules.strandedness.RunFastqStrand(_name, *args, **kws)

Run the fastq_strand.py utility

init(fastq_pairs, qc_dir, fastq_strand_conf, fastq_strand_subset=None, nthreads=None)

Initialise the RunFastqStrand task.

Parameters:
  • fastq_pairs (list) – list of tuples with “pairs” of Fastq files to run fastq_strand.py on (it is expected that this list will come from the CheckFastqStrandOutputs task)

  • qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)

  • fastq_strand_conf (str) – path to the fastq_strand config file to use

  • fastq_strand_subset (int) – explicitly specify the subset size for running fastq_strand

  • nthreads (int) – number of threads/processors to use (defaults to number of slots set in job runner)

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.qc.modules.strandedness.SetupFastqStrandConf(_name, *args, **kws)

Set up a fastq_strand.conf file

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(project, qc_dir=None, organism=None, star_indexes=None)

Initialise the SetupFastqStrandConf task.

Parameters:
  • project (AnalysisProject) – project to run QC for

  • qc_dir (str) – if supplied then points to directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)

  • organism (str) – if supplied then must be a string with the names of one or more organisms, with multiple organisms separated by spaces (defaults to the organisms associated with the project)

  • star_indexes (dict) – dictionary mapping normalised organism names to STAR indexes

Outputs:
fastq_strand_conf (PipelineParam): pipeline

parameter instance that resolves to a string with the path to the generated config file.

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.qc.modules.strandedness.Strandedness

Class for handling the ‘strandedness’ QC module

classmethod add_to_pipeline(p, project_name, project, qc_dir, organism, read_numbers, fastqs, star_indexes, include_samples=None, nthreads=None, fastq_subset=None, require_tasks=[], verify_runner=None, compute_runner=None, envmodules=None, verbose=False)

Adds tasks for ‘strandedness’ module to pipeline

Parameters:
  • p (Pipeline) – pipeline to extend

  • project_name (str) – name of project

  • project (AnalysisProject) – project to run module on

  • qc_dir (str) – path to QC directory

  • organism (str) – name of organism(s)

  • read_numbers (list) – read numbers to include

  • fastqs (list) – Fastqs to run the module on

  • star_indexes (mapping) – associated STAR indexes

  • include_samples (list) – subset of sample names to include

  • fastq_subset (int) – subset of reads to use for FastqScreen

  • nthreads (int) – number of threads (if not set then will be taken from the runner)

  • require_tasks (list) – list of tasks that the module needs to wait for

  • verify_runner (JobRunner) – runner to use for checks

  • compute_runner (JobRunner) – runner to use for computation

  • verbose (bool) – enable verbose output

classmethod collect_qc_outputs(qc_dir)

Collect information on strandedness outputs

Returns an AttributeDictionary with the following attributes:

  • name: set to ‘strandedness’

  • software: dictionary of software and versions

  • fastqs: list of associated Fastq names

  • config_files: list of associated config files (‘fastq_strand.conf’)

  • output_files: list of associated output files

  • tags: list of associated output classes

Parameters:

qc_dir (QCDir) – QC directory to examine

classmethod verify(params, qc_outputs)

Verify ‘strandedness’ QC module against outputs

Returns one of 3 values:

  • True: outputs verified ok

  • False: outputs failed to verify

  • None: verification not possible

Parameters:
  • params (AttributeDictionary) – values of parameters used as inputs

  • qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method

auto_process_ngs.qc.modules.strandedness.check_fastq_strand_outputs(project, qc_dir, fastq_strand_conf, fastqs=None, read_numbers=None)

Return Fastqs missing QC outputs from fastq_strand.py

Returns a list of the Fastqs from a project for which one or more associated outputs from fastq_strand.py don’t exist in the specified QC directory.

Parameters:
  • project (AnalysisProject) – project to check the QC outputs for

  • qc_dir (str) – path to the QC directory (relative path is assumed to be a subdirectory of the project)

  • fastq_strand_conf (str) – path to a fastq_strand config file; strandedness QC outputs will be included unless the path is None or the config file doesn’t exist. Relative path is assumed to be a subdirectory of the project

  • fastqs (list) – optional list of Fastqs to check against (defaults to Fastqs from the project)

  • read_numbers (list) – read numbers to predict outputs for

Returns:

list of Fastq file “pairs” with missing

outputs; pairs are (R1,R2) tuples, with ‘R2’ missing if only one Fastq is used for the strandedness determination.

Return type:

List