auto_process_ngs.qc.modules.fastqc

Implements the ‘fastqc’ QC module:

  • Fastqc: core QCModule class

  • CheckFastqcOutputs: pipeline task to check outputs

  • RunFastqc: pipeline task to run Fastqc

  • check_fastqc_outputs: helper function for checking outputs

class auto_process_ngs.qc.modules.fastqc.CheckFastQCOutputs(_name, *args, **kws)

Check the outputs from FastQC

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(project, qc_dir, read_numbers, fastqs=None, verbose=False)

Initialise the CheckFastQCOutputs task.

Parameters:
  • project (AnalysisProject) – project to run QC for

  • qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)

  • read_numbers (list) – list of read numbers to include

  • fastqs (list) – optional, list of Fastq files (overrides Fastqs in project)

  • verbose (bool) – if True then print additional information from the task

Outputs:
fastqs (list): list of Fastqs that have

missing FastQC outputs under the specified QC protocol

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.qc.modules.fastqc.Fastqc

Class for handling the ‘fastqc’ QC module

classmethod add_to_pipeline(p, project_name, project, qc_dir, read_numbers, fastqs, nthreads=None, require_tasks=[], verify_runner=None, compute_runner=None, envmodules=None, verbose=True)

Adds tasks for ‘fastqc’ module to pipeline

Parameters:
  • p (Pipeline) – pipeline to extend

  • project_name (str) – name of project

  • project (AnalysisProject) – project to run module on

  • qc_dir (str) – path to QC directory

  • read_numbers (list) – read numbers to include

  • fastqs (list) – Fastqs to run the module on

  • nthreads (int) – number of threads (if not set then will be taken from the runner)

  • require_tasks (list) – list of tasks that the module needs to wait for

  • verify_runner (JobRunner) – runner to use for checks

  • compute_runner (JobRunner) – runner to use for computation

  • envmodules (list) – environment module names to load for running FastQC

  • verbose (bool) – enable verbose output

classmethod collect_qc_outputs(qc_dir)

Collect information on FastQC outputs

Returns an AttributeDictionary with the following attributes:

  • name: set to ‘fastqc’

  • software: dictionary of software and versions

  • fastqs: list of associated Fastq names

  • output_files: list of associated output files

  • tags: list of associated output classes

Parameters:

qc_dir (QCDir) – QC directory to examine

classmethod verify(params, qc_outputs)

Verify ‘fastqc’ QC module against outputs

Returns one of 3 values:

  • True: outputs verified ok

  • False: outputs failed to verify

  • None: verification not possible

Parameters:
  • params (AttributeDictionary) – values of parameters used as inputs

  • qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method

class auto_process_ngs.qc.modules.fastqc.RunFastQC(_name, *args, **kws)

Run FastQC

init(fastqs, qc_dir, nthreads=None)

Initialise the RunIlluminaQC task.

Parameters:
  • fastqs (list) – list of paths to Fastq files to run Fastq Screen on (it is expected that this list will come from the CheckIlluminaQCOutputs task)

  • qc_dir (str) – directory for QC outputs (defaults to subdirectory ‘qc’ of project directory)

  • nthreads (int) – number of threads/processors to use (defaults to number of slots set in runner)

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

auto_process_ngs.qc.modules.fastqc.check_fastqc_outputs(project, qc_dir, fastqs=None, read_numbers=None)

Return Fastqs missing QC outputs from FastQC

Returns a list of the Fastqs from a project for which one or more associated outputs from FastQC don’t exist in the specified QC directory.

Parameters:
  • project (AnalysisProject) – project to check the QC outputs for

  • qc_dir (str) – path to the QC directory (relative path is assumed to be a subdirectory of the project)

  • fastqs (list) – optional list of Fastqs to check against (defaults to Fastqs from the project)

  • read_numbers (list) – read numbers to predict outputs for

Returns:

list of Fastq files with missing outputs.

Return type:

List