auto_process_ngs.qc.verification
Utilities for verifying QC pipeline outputs.
Provides the following classes:
QCVerifier: enables verification of QC outputs against protocols
Provides the following functions:
verify_project: check the QC outputs for a project
- class auto_process_ngs.qc.verification.QCVerifier(qc_dir, fastq_attrs=None)
Class to perform verification of QC outputs
The QCVerifier enables the QC outputs from a directory to be checked against arbitrary QC protocols via its
verifymethod.For example:
>>> project = AnalysisProject("/data/projects/PJB") >>> verifier = QCVerifier(project.qc_dir) >>> verifier.verify(project.fastqs,"standardPE") True
- Parameters:
qc_dir (str) – path to directory to examine
fastq_attrs (BaseFastqAttrs) – (optional) class for extracting data from Fastq names
- identify_seq_data(samples)
Identify samples with sequence (biological) data
- Parameters:
samples (list) – list of all sample names
- Returns:
subset of sample names with sequence data.
- Return type:
- verify(protocol, fastqs, organism=None, fastq_screens=None, star_index=None, annotation_bed=None, annotation_gtf=None, cellranger_version=None, cellranger_refdata=None, cellranger_use_multi_config=None, cellranger_required_version=None, seq_data_samples=None)
Verify QC outputs for Fastqs against specified protocol
- Parameters:
protocol (QCProtocol) – QC protocol to verify against
fastqs (list) – list of Fastqs to verify outputs for
organism (str) – organism associated with outputs
fastq_screens (list) – list of panel names to verify FastqScreen outputs against
star_index (str) – path to STAR index
annotation_bed (str) – path to BED annotation file
annotation_gtf (str) – path to GTF annotation file
cellranger_version (str) – specific version of 10x package to check for
cellranger_refdata (str) – specific 10x reference dataset to check for
cellranger_use_multi_config (bool) – if True then cellranger count verification will attempt to use data (GEX samples and reference dataset) from the ‘10x_multi_config.csv’ file
cellranger_required_version (str) – specifies which versions of 10x package pipeline are required (e.g. “>=9”, “=7” etc)
seq_data_samples (list) – list of sample names with sequence (i.e. biological) data
- Returns:
- True if all expected outputs are present,
False otherwise.
- Return type:
Boolean
- verify_qc_module(name, params)
Verify QC outputs for specific QC module
- Parameters:
name (str) – QC module name
params (AttributeDictionary) – parameters to verify QC module using
- Returns:
- True if all outputs are present, False
if one or more are missing.
- Return type:
Boolean
- Raises:
Exception – if the specified QC module name is not recognised.
- auto_process_ngs.qc.verification.verify_project(project, qc_dir=None, qc_protocol=None, fastqs=None)
Check the QC outputs are correct for a project
- Parameters:
project (AnalysisProject) – project to verify QC for
qc_dir (str) – path to the QC output dir; relative path will be treated as a subdirectory of the project being checked.
qc_protocol (str) – QC protocol name or specification to verify against (optional)
fastqs – list of Fastqs to include (optional, defaults to Fastqs in the project)