auto_process_ngs.mockqc
mockqc.py
Provides classes and functions for mocking up examples of QC artefacts from the process pipeline, to be used in testing.
The core class is:
MockQCOutputs
which provides a set of static methods for creating mock outputs from different internal and third-party QC components (e.g. fastqc, fastq_screen etc).
In addition there is a factory function for creating mock QC directories in different configurations:
make_mock_qc_dir: create and populate a mock QC directory
- class auto_process_ngs.mockqc.MockQCOutputs
Utility class for creating mock auto-process QC outputs
- classmethod cellranger_count(sample, qc_dir, cellranger='cellranger', version=None, reference_data_path='/data/refdata-cellranger-1.2.0', prefix='cellranger_count')
Create mock outputs for ‘cellranger[-atac|-arc] count’
- Parameters:
sample (str) – sample name to create mock outputs for
qc_dir (str) – path to top level QC directory
cellranger (str) – pipeline name; one of ‘cellranger’, ‘cellranger-atac’ or ‘cellranger-arc’ (default: ‘cellranger’)
version (str) – explicit version of 10x pipeline to associate with mock outputs (default: determined from pipeline name)
reference_data_path (str) – explicit path to reference dataset (doesn’t have to exist)
prefix (str) – relative path to QC directory to put mock outputs into (default: ‘cellranger_count’)
- classmethod cellranger_multi(samples, qc_dir, config_csv=None, prefix='cellranger_multi', cellranger_version=None)
Create mock outputs for ‘cellranger multi’
- Parameters:
samples (list) – sample names to create mock outputs for
qc_dir (str) – path to top level QC directory
config_csv (str) – path to an associated CSV config file
prefix (str) – relative path to QC directory to put mock outputs into (default: ‘cellranger_multi’)
cellranger_version (str) – version of cellranger to mimick (default: default defined in module)
- classmethod fastq_basename(fastq)
Return the basename for a FASTQ file
- classmethod fastq_screen_v0_9_2(fastq, qc_dir, screen_name=None, legacy=False)
Create mock outputs from Fastq_screen v0.9.2
- classmethod fastq_strand_v0_0_4(fastq, qc_dir)
Create mock outputs from fastq_strand.py v0.0.4
- classmethod fastqc_v0_11_2(fastq, qc_dir)
Create mock outputs from FastQC v0.11.2
- classmethod multiqc(dirn, multiqc_html=None, version='1.8')
Create mock output from MultiQC
- classmethod picard_collect_insert_size_metrics(fq, organism, qc_dir)
Create mock outputs from Picard CollectInsertSizeMetrics
- classmethod qualimap_rnaseq(fq, organism, qc_dir)
Create mock outputs from Qualimap ‘rnaseq’ function
- classmethod rseqc_genebody_coverage(name, organism, qc_dir)
Create mock outputs from RSeQC geneBody_coverage.py
- classmethod rseqc_infer_experiment(fq, organism, qc_dir)
Create mock outputs from RSeQC infer_experiment.py
- classmethod seqlens(fastq, qc_dir)
Create mock outputs from sequence length pipeline task
- auto_process_ngs.mockqc.make_mock_qc_dir(qc_dir, fastq_names, fastq_dir=None, protocol=None, project_name=None, screens=('model_organisms', 'other_organisms', 'rRNA'), organisms=('human',), cellranger_pipelines=('cellranger',), cellranger_samples=None, cellranger_multi_samples=None, seq_data_samples=None, include_fastqc=True, include_fastq_screen=True, include_strandedness=True, include_seqlens=True, include_rseqc_infer_experiment=False, include_rseqc_genebody_coverage=False, include_picard_insert_size_metrics=False, include_qualimap_rnaseq=False, include_multiqc=False, include_cellranger_count=False, include_cellranger_multi=False, cellranger_version=None, legacy_screens=False, legacy_cellranger_outs=False)
Create a mock QC directory with QC artefacts
- Parameters:
qc_dir (str) – path to the mock QC directory to create
fastq_names (list) – Fastq names to make outputs for
fastq_dir (str) – optional, set a non-standard directory for the Fastq files
protocol (str) – QC protocol to emulate
project_name (str) – optional, specify the project name
screens (list) – optional, list of non-standard FastqScreen panel names
organisms (list) – optional, list of organism names for extended QC metrics
cellranger_pipelines (list) – list of 10xGenomics pipelines to make mock outputs for (e.g. ‘cellranger’, ‘cellranger-atac’ etc)
cellranger_samples (list) – list of sample names to produce ‘cellranger count’ outputs for
cellranger_multi_samples (list) – list of multiplexed sample names for 10x CellPlex
cellranger_version (str) – if set then specifies version of Cellranger to mimick
seq_data_samples (list) – list with subset of sample names which include sequence (i.e. biological) data
include_fastqc (bool) – include outputs from Fastqc
include_fastq_screen (bool) – include outputs from FastqScreen
include_strandedness (bool) – include outputs from strandedness
include_seqlens (bool) – include sequence length metrics
include_rseqc_infer_experiment (bool) – include RSeQC infer_experiment.py outputs
include_rseqc_genebody_coverage (bool) – include RSeQC geneBody_coverage.py outputs
include_picard_insert_size_metrics (bool) – include Picard CollectInsertSizeMetrics outputs
include_qualimap_rnaseq (bool) – include Qualimap ‘rnaseq’ outputs
include_multiqc (bool) – include MultiQC outputs
include_celllranger_count (bool) – include ‘cellranger count’ outputs
include_cellranger_multi (bool) – include ‘cellranger multi’ outputs
legacy_screens (bool) – if True then use legacy naming convention for FastqScreen outputs
legacy_cellranger_outs (bool) – if True then use legacy naming convention for 10xGenomics pipeline outputs
- Returns:
path to the mock QC directory that was created.
- Return type:
String