auto_process_ngs.qc.modules.fastq_screen

Implements the ‘fastq_screen’ QC module:

  • FastqScreen: core QCModule class

  • CheckFastqScreenOutputs: pipeline task to check outputs

  • RunFastqScreen: pipeline task to run FastqScreen

  • check_fastq_screen_outputs: helper function for checking outputs

class auto_process_ngs.qc.modules.fastq_screen.CheckFastqScreenOutputs(_name, *args, **kws)

Check the outputs from FastqScreen

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(project, qc_dir, screens, fastqs=None, read_numbers=None, include_samples=None, fastq_attrs=None, legacy=False, verbose=False)

Initialise the CheckFastqScreenOutputs task.

Parameters:
  • project (AnalysisProject) – project to run QC for

  • qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)

  • screens (mapping) – mapping of screen names to FastqScreen conf files

  • fastqs (list) – explicit list of Fastq files to check against (default is to use Fastqs from supplied analysis project)

  • read_numbers (list) – read numbers to include

  • include_samples (list) – optional, list of sample names to include

  • fastq_attrs (BaseFastqAttrs) – class to use for extracting data from Fastq names

  • legacy (bool) – if True then use ‘legacy’ naming convention for output files (default is to use new format)

  • verbose (bool) – if True then print additional information from the task

Outputs:
fastqs (list): list of Fastqs that have

missing FastqScreen outputs under the specified QC protocol

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.qc.modules.fastq_screen.FastqScreen

Class for handling the ‘fastq_screen’ QC module

classmethod add_to_pipeline(p, project_name, project, qc_dir, screens, fastqs, read_numbers, include_samples=None, nthreads=None, fastq_subset=None, legacy=False, requires_tasks=[], verify_runner=None, compute_runner=None, envmodules=None, verbose=False)

Adds tasks for ‘fastq_screen’ module to pipeline

Parameters:
  • p (Pipeline) – pipeline to extend

  • project_name (str) – name of project

  • project (AnalysisProject) – project to run module on

  • qc_dir (str) – path to QC directory

  • screens (list) – list of screen names

  • fastqs (list) – Fastqs to run the module on

  • read_numbers (list) – read numbers to include

  • include_samples (list) – subset of sample names to include

  • nthreads (int) – number of threads (if not set then will be taken from the runner)

  • fastq_subset (int) – subset of reads to use for FastqScreen

  • legacy (bool) – whether to use legacy naming for output files

  • verbose (bool) – enable verbose output

  • require_tasks (list) – list of tasks that the module needs to wait for

  • verify_runner (JobRunner) – runner to use for checks

  • compute_runner (JobRunner) – runner to use for computation

classmethod collect_qc_outputs(qc_dir)

Collect information on FastqScreen outputs

Returns an AttributeDictionary with the following attributes:

  • name: set to ‘fastq_screen’

  • software: dictionary of software and versions

  • screen_names: list of associated panel names

  • fastqs: list of associated Fastq names

  • fastqs_for_screen: dictionary of panel names and lists

    of Fastq names associated with each panel

  • output_files: list of associated output files

  • tags: list of associated output classes

Parameters:

qc_dir (QCDir) – QC directory to examine

classmethod verify(params, qc_outputs)

Verify ‘fastq_screen’ QC module against outputs

Returns one of 3 values:

  • True: outputs verified ok

  • False: outputs failed to verify

  • None: verification not possible

Parameters:
  • params (AttributeDictionary) – values of parameters used as inputs

  • qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method

class auto_process_ngs.qc.modules.fastq_screen.RunFastqScreen(_name, *args, **kws)

Run FastqScreen

init(fastqs, qc_dir, screens, subset=None, nthreads=None, read_numbers=None, fastq_attrs=None, legacy=False)

Initialise the RunFastqScreen task.

Parameters:
  • fastqs (list) – list of paths to Fastq files to run Fastq Screen on (it is expected that this list will come from the CheckIlluminaQCOutputs task)

  • qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)

  • screens (mapping) – mapping of screen names to FastqScreen conf files

  • subset (int) – explicitly specify the subset size for running Fastq_screen

  • nthreads (int) – number of threads/processors to use (defaults to number of slots set in runner)

  • read_numbers (list) – list of read numbers to include when running Fastq Screen

  • fastq_attrs (BaseFastqAttrs) – class to use for extracting data from Fastq names

  • legacy (bool) – if True then use ‘legacy’ naming convention for output files (default is to use new format)

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

auto_process_ngs.qc.modules.fastq_screen.check_fastq_screen_outputs(project, qc_dir, screen, fastqs=None, read_numbers=None, legacy=False)

Return Fastqs missing QC outputs from FastqScreen

Returns a list of the Fastqs from a project for which one or more associated outputs from FastqScreen don’t exist in the specified QC directory.

Parameters:
  • project (AnalysisProject) – project to check the QC outputs for

  • qc_dir (str) – path to the QC directory (relative path is assumed to be a subdirectory of the project)

  • screen (str) – screen name to check

  • fastqs (list) – optional list of Fastqs to check against (defaults to Fastqs from the project)

  • read_numbers (list) – read numbers to define Fastqs to predict outputs for; if not set then all non-index reads will be included

  • legacy (bool) – if True then check for ‘legacy’-style names (defult: False)

Returns:

list of Fastq files with missing outputs.

Return type:

List