auto_process_ngs.qc.verification

Utilities for verifying QC pipeline outputs.

Provides the following classes:

  • QCVerifier: enables verification of QC outputs against protocols

Provides the following functions:

  • verify_project: check the QC outputs for a project

class auto_process_ngs.qc.verification.QCVerifier(qc_dir, fastq_attrs=None)

Class to perform verification of QC outputs

The QCVerifier enables the QC outputs from a directory to be checked against arbitrary QC protocols via its verify method.

For example:

>>> project = AnalysisProject("/data/projects/PJB")
>>> verifier = QCVerifier(project.qc_dir)
>>> verifier.verify(project.fastqs,"standardPE")
True
Parameters:
  • qc_dir (str) – path to directory to examine

  • fastq_attrs (BaseFastqAttrs) – (optional) class for extracting data from Fastq names

identify_seq_data(samples)

Identify samples with sequence (biological) data

Parameters:

samples (list) – list of all sample names

Returns:

subset of sample names with sequence data.

Return type:

List

verify(protocol, fastqs, organism=None, fastq_screens=None, star_index=None, annotation_bed=None, annotation_gtf=None, cellranger_version=None, cellranger_refdata=None, cellranger_use_multi_config=None, cellranger_required_version=None, seq_data_samples=None)

Verify QC outputs for Fastqs against specified protocol

Parameters:
  • protocol (QCProtocol) – QC protocol to verify against

  • fastqs (list) – list of Fastqs to verify outputs for

  • organism (str) – organism associated with outputs

  • fastq_screens (list) – list of panel names to verify FastqScreen outputs against

  • star_index (str) – path to STAR index

  • annotation_bed (str) – path to BED annotation file

  • annotation_gtf (str) – path to GTF annotation file

  • cellranger_version (str) – specific version of 10x package to check for

  • cellranger_refdata (str) – specific 10x reference dataset to check for

  • cellranger_use_multi_config (bool) – if True then cellranger count verification will attempt to use data (GEX samples and reference dataset) from the ‘10x_multi_config.csv’ file

  • cellranger_required_version (str) – specifies which versions of 10x package pipeline are required (e.g. “>=9”, “=7” etc)

  • seq_data_samples (list) – list of sample names with sequence (i.e. biological) data

Returns:

True if all expected outputs are present,

False otherwise.

Return type:

Boolean

verify_qc_module(name, params)

Verify QC outputs for specific QC module

Parameters:
  • name (str) – QC module name

  • params (AttributeDictionary) – parameters to verify QC module using

Returns:

True if all outputs are present, False

if one or more are missing.

Return type:

Boolean

Raises:

Exception – if the specified QC module name is not recognised.

auto_process_ngs.qc.verification.verify_project(project, qc_dir=None, qc_protocol=None, fastqs=None)

Check the QC outputs are correct for a project

Parameters:
  • project (AnalysisProject) – project to verify QC for

  • qc_dir (str) – path to the QC output dir; relative path will be treated as a subdirectory of the project being checked.

  • qc_protocol (str) – QC protocol name or specification to verify against (optional)

  • fastqs – list of Fastqs to include (optional, defaults to Fastqs in the project)