auto_process_ngs.mockqc

mockqc.py

Provides classes and functions for mocking up examples of QC artefacts from the process pipeline, to be used in testing.

The core class is:

  • MockQCOutputs

which provides a set of static methods for creating mock outputs from different internal and third-party QC components (e.g. fastqc, fastq_screen etc).

In addition there is a factory function for creating mock QC directories in different configurations:

  • make_mock_qc_dir: create and populate a mock QC directory

class auto_process_ngs.mockqc.MockQCOutputs

Utility class for creating mock auto-process QC outputs

classmethod cellranger_count(sample, qc_dir, cellranger='cellranger', version=None, reference_data_path='/data/refdata-cellranger-1.2.0', prefix='cellranger_count')

Create mock outputs for ‘cellranger[-atac|-arc] count’

Parameters:
  • sample (str) – sample name to create mock outputs for

  • qc_dir (str) – path to top level QC directory

  • cellranger (str) – pipeline name; one of ‘cellranger’, ‘cellranger-atac’ or ‘cellranger-arc’ (default: ‘cellranger’)

  • version (str) – explicit version of 10x pipeline to associate with mock outputs (default: determined from pipeline name)

  • reference_data_path (str) – explicit path to reference dataset (doesn’t have to exist)

  • prefix (str) – relative path to QC directory to put mock outputs into (default: ‘cellranger_count’)

classmethod cellranger_multi(samples, qc_dir, config_csv=None, prefix='cellranger_multi', cellranger_version=None)

Create mock outputs for ‘cellranger multi’

Parameters:
  • samples (list) – sample names to create mock outputs for

  • qc_dir (str) – path to top level QC directory

  • config_csv (str) – path to an associated CSV config file

  • prefix (str) – relative path to QC directory to put mock outputs into (default: ‘cellranger_multi’)

  • cellranger_version (str) – version of cellranger to mimick (default: default defined in module)

classmethod fastq_basename(fastq)

Return the basename for a FASTQ file

classmethod fastq_screen_v0_9_2(fastq, qc_dir, screen_name=None, legacy=False)

Create mock outputs from Fastq_screen v0.9.2

classmethod fastq_strand_v0_0_4(fastq, qc_dir)

Create mock outputs from fastq_strand.py v0.0.4

classmethod fastqc_v0_11_2(fastq, qc_dir)

Create mock outputs from FastQC v0.11.2

classmethod multiqc(dirn, multiqc_html=None, version='1.8')

Create mock output from MultiQC

classmethod picard_collect_insert_size_metrics(fq, organism, qc_dir)

Create mock outputs from Picard CollectInsertSizeMetrics

classmethod qualimap_rnaseq(fq, organism, qc_dir)

Create mock outputs from Qualimap ‘rnaseq’ function

classmethod rseqc_genebody_coverage(name, organism, qc_dir)

Create mock outputs from RSeQC geneBody_coverage.py

classmethod rseqc_infer_experiment(fq, organism, qc_dir)

Create mock outputs from RSeQC infer_experiment.py

classmethod seqlens(fastq, qc_dir)

Create mock outputs from sequence length pipeline task

auto_process_ngs.mockqc.make_mock_qc_dir(qc_dir, fastq_names, fastq_dir=None, protocol=None, project_name=None, screens=('model_organisms', 'other_organisms', 'rRNA'), organisms=('human',), cellranger_pipelines=('cellranger',), cellranger_samples=None, cellranger_multi_samples=None, seq_data_samples=None, include_fastqc=True, include_fastq_screen=True, include_strandedness=True, include_seqlens=True, include_rseqc_infer_experiment=False, include_rseqc_genebody_coverage=False, include_picard_insert_size_metrics=False, include_qualimap_rnaseq=False, include_multiqc=False, include_cellranger_count=False, include_cellranger_multi=False, cellranger_version=None, legacy_screens=False, legacy_cellranger_outs=False)

Create a mock QC directory with QC artefacts

Parameters:
  • qc_dir (str) – path to the mock QC directory to create

  • fastq_names (list) – Fastq names to make outputs for

  • fastq_dir (str) – optional, set a non-standard directory for the Fastq files

  • protocol (str) – QC protocol to emulate

  • project_name (str) – optional, specify the project name

  • screens (list) – optional, list of non-standard FastqScreen panel names

  • organisms (list) – optional, list of organism names for extended QC metrics

  • cellranger_pipelines (list) – list of 10xGenomics pipelines to make mock outputs for (e.g. ‘cellranger’, ‘cellranger-atac’ etc)

  • cellranger_samples (list) – list of sample names to produce ‘cellranger count’ outputs for

  • cellranger_multi_samples (list) – list of multiplexed sample names for 10x CellPlex

  • cellranger_version (str) – if set then specifies version of Cellranger to mimick

  • seq_data_samples (list) – list with subset of sample names which include sequence (i.e. biological) data

  • include_fastqc (bool) – include outputs from Fastqc

  • include_fastq_screen (bool) – include outputs from FastqScreen

  • include_strandedness (bool) – include outputs from strandedness

  • include_seqlens (bool) – include sequence length metrics

  • include_rseqc_infer_experiment (bool) – include RSeQC infer_experiment.py outputs

  • include_rseqc_genebody_coverage (bool) – include RSeQC geneBody_coverage.py outputs

  • include_picard_insert_size_metrics (bool) – include Picard CollectInsertSizeMetrics outputs

  • include_qualimap_rnaseq (bool) – include Qualimap ‘rnaseq’ outputs

  • include_multiqc (bool) – include MultiQC outputs

  • include_celllranger_count (bool) – include ‘cellranger count’ outputs

  • include_cellranger_multi (bool) – include ‘cellranger multi’ outputs

  • legacy_screens (bool) – if True then use legacy naming convention for FastqScreen outputs

  • legacy_cellranger_outs (bool) – if True then use legacy naming convention for 10xGenomics pipeline outputs

Returns:

path to the mock QC directory that was created.

Return type:

String