auto_process_ngs.mockqc

mockqc.py

Provides classes and functions for mocking up examples of QC artefacts from the process pipeline, to be used in testing.

The core class is:

  • MockQCOutputs

which provides a set of static methods for creating mock outputs from different internal and third-party QC components (e.g. fastqc, fastq_screen etc).

In addition there is a factory function for creating mock QC directories in different configurations:

  • make_mock_qc_dir: create and populate a mock QC directory

There are also the following helper functions:

  • make_10x_multi_config_file: creates mock ‘10x_multi_config.cvs’ files

class auto_process_ngs.mockqc.MockQCOutputs

Utility class for creating mock auto-process QC outputs

classmethod cellranger_count(sample, qc_dir, cellranger='cellranger', version=None, reference_data_path='/data/refdata-cellranger-1.2.0', prefix='cellranger_count')

Create mock outputs for ‘cellranger[-atac|-arc] count’

Parameters:
  • sample (str) – sample name to create mock outputs for

  • qc_dir (str) – path to top level QC directory

  • cellranger (str) – pipeline name; one of ‘cellranger’, ‘cellranger-atac’ or ‘cellranger-arc’ (default: ‘cellranger’)

  • version (str) – explicit version of 10x pipeline to associate with mock outputs (default: determined from pipeline name)

  • reference_data_path (str) – explicit path to reference dataset (doesn’t have to exist)

  • prefix (str) – relative path to QC directory to put mock outputs into (default: ‘cellranger_count’)

classmethod cellranger_multi(samples, qc_dir, config_csv=None, prefix='cellranger_multi', cellranger_version=None)

Create mock outputs for ‘cellranger multi’

Parameters:
  • samples (list) – sample names to create mock outputs for

  • qc_dir (str) – path to top level QC directory

  • config_csv (str) – path to an associated CSV config file

  • prefix (str) – relative path to QC directory to put mock outputs into (default: ‘cellranger_multi’)

  • cellranger_version (str) – version of cellranger to mimick (default: default defined in module)

classmethod fastq_basename(fastq, extensions=['fastq'])

Return the basename for a FASTQ file

By default only ‘.fastq[.gz]’ extensions are recognised and removed; to strip additional or alternative extensions, specify all possibilities as a list in the ‘extensions’ argument.

Parameters:
  • fastq (str) – full file name for Fastq file (can include leading path)

  • extentions (list) – optional list of Fastq file extensions that are recognised and removed (default: [“fastq”])

Returns:

the basename for the Fastq file with leading path and extension removed.

Return type:

String

classmethod fastq_screen_v0_15_3(fastq, qc_dir, screen_name=None, legacy=False)

Create mock outputs from Fastq_screen v0.15.3

classmethod fastq_screen_v0_9_2(fastq, qc_dir, screen_name=None, legacy=False)

Create mock outputs from Fastq_screen v0.9.2

classmethod fastq_strand_v0_0_4(fastq, qc_dir)

Create mock outputs from fastq_strand.py v0.0.4

classmethod fastqc_v0_11_2(fastq, qc_dir)

Create mock outputs from FastQC v0.11.2

classmethod fastqc_v0_12_1(fastq, qc_dir)

Create mock outputs from FastQC v0.12.1

classmethod multiqc(dirn, multiqc_html=None, version='1.8')

Create mock output from MultiQC

classmethod picard_collect_insert_size_metrics(fq, organism, qc_dir)

Create mock outputs from Picard CollectInsertSizeMetrics

classmethod qualimap_rnaseq(fq, organism, qc_dir)

Create mock outputs from Qualimap ‘rnaseq’ function

classmethod rseqc_genebody_coverage(name, organism, qc_dir)

Create mock outputs from RSeQC geneBody_coverage.py

classmethod rseqc_infer_experiment(fq, organism, qc_dir)

Create mock outputs from RSeQC infer_experiment.py

classmethod seqlens(fastq, qc_dir)

Create mock outputs from sequence length pipeline task

auto_process_ngs.mockqc.make_10x_multi_config_file(qc_dir, fastq_dir, multiplexed_samples, sample=None, protocol='10x_Cellplex')

Helper function to make cellranger multi config.csv file

Parameters:
  • qc_dir (str) – directory to create the config file under

  • fastq_dir (str) – path to Fastq directory

  • multiplexed_samples (list) – list of multiplexed sample names

  • sample (str) – associated physical sample name (or None)

  • protocol (str) – QC protocol (optional)

auto_process_ngs.mockqc.make_mock_qc_dir(qc_dir, fastq_names, fastq_dir=None, protocol=None, project_name=None, screens=('model_organisms', 'other_organisms', 'rRNA'), organisms=('human',), cellranger_pipelines=('cellranger',), cellranger_samples=None, cellranger_multi_samples=None, seq_data_samples=None, include_fastqc=True, include_fastq_screen=True, include_strandedness=False, include_seqlens=True, include_rseqc_infer_experiment=False, include_rseqc_genebody_coverage=False, include_picard_insert_size_metrics=False, include_qualimap_rnaseq=False, include_multiqc=False, include_cellranger_count=False, include_cellranger_multi=False, cellranger_version=None, legacy_screens=False, legacy_cellranger_outs=False, legacy_cellranger_count_prefix=False)

Create a mock QC directory with QC artefacts

Parameters:
  • qc_dir (str) – path to the mock QC directory to create

  • fastq_names (list) – Fastq names to make outputs for

  • fastq_dir (str) – optional, set a non-standard directory for the Fastq files

  • protocol (str) – QC protocol to emulate

  • project_name (str) – optional, specify the project name

  • screens (list) – optional, list of non-standard FastqScreen panel names

  • organisms (list) – optional, list of organism names for extended QC metrics

  • cellranger_pipelines (list) – list of 10xGenomics pipelines to make mock outputs for (e.g. ‘cellranger’, ‘cellranger-atac’ etc)

  • cellranger_samples (list) – list of sample names to produce ‘cellranger count’ outputs for

  • cellranger_multi_samples (list/dict) – either a flat list of multiplexed sample names for 10x CellPlex (single unnamed physical sample), or a dictionary where keys are physical sample names and the items are lists of the associated multiplexed sample names

  • cellranger_version (str) – if set then specifies version of Cellranger to mimick

  • seq_data_samples (list) – list with subset of sample names which include sequence (i.e. biological) data

  • include_fastqc (bool) – include outputs from Fastqc

  • include_fastq_screen (bool) – include outputs from FastqScreen

  • include_strandedness (bool) – include outputs from strandedness

  • include_seqlens (bool) – include sequence length metrics

  • include_rseqc_infer_experiment (bool) – include RSeQC infer_experiment.py outputs

  • include_rseqc_genebody_coverage (bool) – include RSeQC geneBody_coverage.py outputs

  • include_picard_insert_size_metrics (bool) – include Picard CollectInsertSizeMetrics outputs

  • include_qualimap_rnaseq (bool) – include Qualimap ‘rnaseq’ outputs

  • include_multiqc (bool) – include MultiQC outputs

  • include_celllranger_count (bool) – include ‘cellranger count’ outputs

  • include_cellranger_multi (bool) – include ‘cellranger multi’ outputs

  • legacy_screens (bool) – if True then use legacy naming convention for FastqScreen outputs

  • legacy_cellranger_outs (bool) – if True then use legacy naming convention for 10xGenomics pipeline outputs

  • legacy_cellranger_count_prefix (bool) – if True then always use “cellranger_count” as the prefix for Cellranger* outputs (otherwise use “<cellranger_pipeline>_count”)

Returns:

path to the mock QC directory that was created.

Return type:

String