auto_process_ngs.mock

Provides classes for mocking up examples of inputs and outputs for various parts of the process pipeline (including example directory structures), as well as mock executables, to be used in testing.

The core classes are:

  • MockAnalysisDir: create mock auto-process analysis directories

  • MockAnalysisProject: create mock auto-process project directories

These can be used to configure and create mock directories mimicking “minimal” versions of analysis directories and projects.

Additional mock artefacts (e.g. QC outputs, barcode analysis etc) can be added to the mock directories once they have been created, using the “updater” classes:

  • UpdateAnalysisDir: add artefacts to an analysis directory

  • UpdateAnalysisProject: add artefacts to an analysis project

There is also a convenience factory class which provides methods to quickly make “default” analysis directories for testing:

  • MockAnalysisDirFactory

It is also possible to make mock executables which mimick some of the external software required for parts of the pipeline:

  • MockBcl2fastq2Exe

  • MockBclConvertExe

  • Mock10xPackageExe

  • MockFastqScreen

  • MockFastQC

  • MockFastqStrandPy

  • MockGtf2bed

  • MockSeqtk

  • MockStar

  • MockSamtools

  • MockPicard

  • MockRSeQC

  • MockQualimap

  • MockMultiQC

  • MockConda

  • MockBowtieBuild

  • MockBowtie2Build

There also is a wrapper for the ‘Mock10xPackageExe’ class which is maintained for backwards compatibility:

  • MockCellrangerExe

There are supporting standalone functions for mocking outputs:

  • make_mock_bcl2fastq2_output: create mock output from bcl2fastq

  • make_mock_analysis_project: create a mock analysis project directory

class auto_process_ngs.mock.DirectoryUpdater(base_dir)

Base class for updating mock directories

Provides the following methods:

  • add_subdir: adds arbitrary subdirectory

  • add_file: adds arbitrary file

add_file(filen, content=None)

Add an arbitrary file to the base dir

Parameters:
  • filen (str) – path of file to add

  • content (str) – if supplied then will be written as content of new file

add_subdir(dirn)

Add an arbitrary directory to the base dir

Parameters:

dirn (str) – path of directory to add

class auto_process_ngs.mock.Mock10xPackageExe(path, exit_code=0, platform=None, assert_bases_mask=None, assert_include_introns=None, assert_chemistry=None, assert_force_cells=None, assert_filter_single_index=None, assert_filter_dual_index=None, assert_rc_i2_override=None, reads=None, multiome_data=None, multi_outputs=None, version=None)

Create mock 10xGenomics pipeline executable

This class can be used to create a mock 10xGenomics pipeline executable, which in turn can be used in place of the actual pipeline software (e.g. cellranger) for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> Mock10xPackageExe.create("/tmpbin/cellranger")

The resulting executable will generate mock outputs when run on actual or mock Illumina sequencer output directories (mock versions can be produced using the ‘mock.IlluminaRun’ class in the genomics-bcftbx package).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

static create(path, exit_code=0, missing_fastqs=None, platform=None, assert_bases_mask=None, assert_include_introns=None, assert_chemistry=None, assert_force_cells=None, assert_filter_single_index=None, assert_filter_dual_index=None, assert_rc_i2_override=None, reads=None, multiome_data=None, multi_outputs=None, version=None)

Create a “mock” 10xGenomics package executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must.

  • exit_code (int) – exit code that the mock executable should complete with

  • missing_fastqs (list) – list of Fastq names that will not be created

  • platform (str) – platform for primary data (if it cannot be determined from the directory/instrument name)

  • assert_bases_mask (str) – if set then check that the supplied bases mask matches this value

  • assert_include_introns (bool) – if set to True/False then check that the ‘–include-introns’ option was/n’t set (ignored if set to None)

  • assert_chemistry (str) – if set then check that the supplied chemistry specification matches this value

  • assert_force_cells (int) – if set then check that the ‘–force-cells’ option was specified with this value

  • assert_filter_single_index (bool) – if set to True/False then check that the ‘–filter-single-index’ option was/n’t supplied (ignored if set to None)

  • assert_filter_dual_index (bool) – if set to True/False then check that the ‘–filter-dual-index’ option was/n’t supplied (ignored if set to None)

  • assert_rc_i2_override (str) – check that the ‘–rc-i2-override’ option was supplied and set to the supplied value (ignore if set to None)

  • reads (list) – list of ‘reads’ that will be created

  • multiome_data (str) – either ‘GEX’ or ‘ATAC’ (when mocking ‘cellranger-arc’)

  • multi_outputs (str) – set type of outputs for ‘cellranger multi’ (either ‘cellplex’ or ‘flex’)

  • version (str) – version of package to report

main(args)

Internal: provides mock 10xGenomics package functionality

class auto_process_ngs.mock.MockAnalysisDir(run_name, platform, unaligned_dir='bcl2fastq', fmt='bcl2fastq2', bases_mask='auto', paired_end=True, lanes=None, no_lane_splitting=False, no_undetermined=False, top_dir=None, params=None, metadata=None, readme=None, project_metadata=None, include_stats_files=False)

Utility class for creating mock auto-process analysis directories

The MockAnalysisDir class allows artificial analysis directories to be defined, created and populated, and then destroyed.

These artifical directories are intended to be used for testing purposes.

Two styles of analysis directories can be produced: ‘casava’-style aims to mimic that produced from the CASAVA and bcl2fastq 1.8 processing software; ‘bcl2fastq2’ mimics that from the bcl2fastq 2.* software.

Basic example usage:

>>> mockdir = MockAnalysisDir('130904_PJB_XXXXX','miseq',fmt='casava')
>>> mockdir.add_fastq_batch('PJB','PJB1','PJB1_GCCAAT',lanes=[1,])
>>> ...
>>> mockdir.create()

This will make a CASAVA-style directory structure like:

1130904_PJB_XXXXX/

metadata.info bcl2fastq/

Project_PJB/
Sample_PJB1/

PJB1_GCCAAT_L001_R1_001.fastq.gz PJB1_GCCAAT_L001_R2_001.fastq.gz

PJB/

README.info fastqs/

PJB1_GCCAAT_L001_R1_001.fastq.gz PJB1_GCCAAT_L001_R2_001.fastq.gz

To delete the physical directory structure when finished:

>>> mockdata.remove()
create(no_project_dirs=False)

Build and populate the directory structure

Creates the directory structure on disk which has been defined within the MockAnalysisDir object.

Invoke the ‘remove’ method to delete the directory structure.

The contents of the MockAnalysisDir object can be modified after the directory structure has been created, but changes will not be reflected on disk. Instead it is necessary to first remove the directory structure, and then re-invoke the create method.

‘create’ raises an OSError exception if any part of the directory structure already exists.

Parameters:

no_project_dirs (bool) – if False then don’t create analysis project subdirectories (these are created by default)

class auto_process_ngs.mock.MockAnalysisDirFactory

Collection of convenient pre-populated test cases

classmethod bcl2fastq2(run_name, platform, paired_end=True, unaligned_dir='bcl2fastq', no_lane_splitting=True, reads=None, top_dir=None, params=None, metadata=None, project_metadata=None, bases_mask='auto', include_stats_files=False)

Basic analysis dir from bcl2fastq v2

classmethod casava(run_name, platform, paired_end=True, unaligned_dir='bcl2fastq', params=None, metadata=None, top_dir=None)

Basic analysis dir from CASAVA/bcl2fastq v1.8

class auto_process_ngs.mock.MockAnalysisProject(name, fastq_names=None, fastq_dir=None, metadata={})

Utility class for creating mock auto-process project directories

Example usage:

>>> m = MockAnalysisProject('PJB',('PJB1_S1_R1_001.fastq.gz',
...                                'PJB1_S1_R2_001.fastq.gz'))
>>> m.create()
add_fastq(fq)

Add a Fastq file to the project

create(top_dir=None, readme=True, scriptcode=True, populate_fastqs=True)

Build and populate the directory structure

Parameters:
  • top_dir (str) – path to directory to create project directory underneath (default is pwd)

  • readme (boolean) – if True then write a README file

  • scriptcode (boolean) – if True then write a ScriptCode subdirectory

  • populate_fastqs (boolean) – if True then write content to the Fastq files

class auto_process_ngs.mock.MockBcl2fastq2Exe(exit_code=0, missing_fastqs=None, platform=None, assert_bases_mask=None, assert_no_lane_splitting=None, assert_create_fastq_for_index_read=None, assert_minimum_trimmed_read_length=None, assert_mask_short_adapter_reads=None, assert_adapter=None, assert_adapter2=None, assert_find_adapters_with_sliding_window=None, version=None)

Create mock bcl2fastq2 executable

This class can be used to create a mock bcl2fastq executable, which in turn can be used in place of the actual bcl2fastq software for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockBcl2fastq2Exe.create("/tmpbin/bcl2fastq")

The resulting executable will generate mock outputs when run on actual or mock Illumina sequencer output directories (mock versions can be produced using the ‘mock.IlluminaRun’ class in the genomics-bcftbx package).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

  • Fastqs can be removed from the output by specifying their names in the missing_fastqs argument

The executable can also be configured to check supplied values:

  • the bases mask can be checked via the assert_bases_mask argument

  • lane splitting can be checked via the assert_no_lane_splitting argument

  • creation of Fastqs for index reads can be checked via the assert_create_fastq_for_index_read argument

  • adapter trimming and masking can be checked via the assert_minimum_trimmed_read_length and assert_mask_short_adapter_reads arguments

  • adapter sequences can be checked via the assert_adapter and assert_adapter2 arguments

  • sliding window algorith for adapter trimming can be checked via assert_find_adapters_with_sliding_window

static create(path, exit_code=0, missing_fastqs=None, platform=None, assert_bases_mask=None, assert_no_lane_splitting=None, assert_create_fastq_for_index_read=None, assert_minimum_trimmed_read_length=None, assert_mask_short_adapter_reads=None, assert_adapter=None, assert_adapter2=None, assert_find_adapters_with_sliding_window=None, version='2.20.0.422')

Create a “mock” bcl2fastq executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must.

  • exit_code (int) – exit code that the mock executable should complete with

  • missing_fastqs (list) – list of Fastq names that will not be created

  • platform (str) – platform for primary data (if it cannot be determined from the directory/instrument name)

  • assert_bases_mask (str) – if set then assert that bases mask matches the supplied string

  • assert_lane_splitting (bool) – if set then assert that –no-lane-splitting matches the supplied boolean value

  • assert_create_fastq_for_index_read – (bool): if set then assert that –create-fastq-for-index-read matches the supplied boolean value

  • assert_minimum_trimmed_read_length (int) – if set then assert that –minimum-trimmed-read-length matches the supplied value

  • assert_mask_short_adapter_reads (int) – if set then assert that –mask-short-adapter-reads matches the supplied value

  • assert_adapter (str) – if set then assert that the adapter sequence in the sample sheet matches the supplied value

  • assert_adapter2 (str) – if set then assert that the adapter sequence for read2 in the sample sheet matches the supplied value

  • assert_find_adapters_with_sliding_window – (bool): if set then assert that –find-adapters-with-sliding-window matches the supplied boolean value

  • version (str) – version of bcl2fastq2 to imitate

main(args)

Internal: provides mock bcl2fastq2 functionality

class auto_process_ngs.mock.MockBclConvertExe(exit_code=0, missing_fastqs=None, platform=None, assert_override_cycles=None, assert_no_lane_splitting=None, assert_create_fastq_for_index_read=None, assert_minimum_trimmed_read_length=None, assert_mask_short_reads=None, assert_adapter1=None, assert_adapter2=None, version=None)

Create mock bcl-convert executable

This class can be used to create a mock bcl-convert executable, which in turn can be used in place of the actual BCLConvert software for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockBclConvertExe.create("/tmpbin/bcl-convert")

The resulting executable will generate mock outputs when run on actual or mock Illumina sequencer output directories (mock versions can be produced using the ‘mock.IlluminaRun’ class in the genomics-bcftbx package).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

  • Fastqs can be removed from the output by specifying their names in the missing_fastqs argument

The executable can also be configured to check supplied values:

  • the masking of cycles can be checked via the assert_override_cycles argument

  • lane splitting can be checked via the assert_no_lane_splitting argument

  • creation of Fastqs for index reads can be checked via the assert_create_fastq_for_index_read argument

  • adapter trimming and masking can be checked via the assert_minimum_trimmed_read_length and assert_mask_short_reads arguments

  • adapater sequences can be checked via the assert_adapter1 and assert_adapter2 arguments

static create(path, exit_code=0, missing_fastqs=None, platform=None, assert_override_cycles=None, assert_no_lane_splitting=None, assert_create_fastq_for_index_read=None, assert_minimum_trimmed_read_length=None, assert_mask_short_reads=None, assert_adapter1=None, assert_adapter2=None, version='3.7.5')

Create a “mock” bcl-convert executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must.

  • exit_code (int) – exit code that the mock executable should complete with

  • missing_fastqs (list) – list of Fastq names that will not be created

  • platform (str) – platform for primary data (if it cannot be determined from the directory/instrument name)

  • assert_override_cycles (str) – if set then assert that the ‘OverrideCycles’ setting matches the supplied string

  • assert_lane_splitting (bool) – if set then assert that –no-lane-splitting matches the supplied boolean value

  • assert_create_fastq_for_index_read – (bool): if set then assert that –create-fastq-for-index-read matches the supplied boolean value

  • assert_minimum_trimmed_read_length (int) – if set then assert that –minimum-trimmed-read-length matches the supplied value

  • assert_mask_short_reads (int) – if set then assert that –mask-short-adapter-reads matches the supplied value

  • assert_adapter1 (str) – if set then assert that the adapter sequence in the sample sheet matches the supplied value

  • assert_adapter2 (str) – if set then assert that the adapter sequence for read2 in the sample sheet matches the supplied value

  • version (str) – version of BCLConvert to imitate

main(args)

Internal: provides mock bcl-convert functionality

class auto_process_ngs.mock.MockBowtie2Build(path, exit_code=0)

Create mock bowtie2-build

This class can be used to create a mock bowtie2-build executable, which in turn can be used in place of an actual executable for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockBowtie2Build.create("/tmpbin/bowtie2-build")

The resulting executable will generate mock outputs when run on the appropriate files (ignoring their contents).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

static create(path, exit_code=0)

Create a “mock” bowtie2-build executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock bowtie-build functionality

class auto_process_ngs.mock.MockBowtieBuild(path, exit_code=0)

Create mock bowtie-build

This class can be used to create a mock bowtie-build executable, which in turn can be used in place of an actual executable for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockBowtieBuild.create("/tmpbin/bowtie-build")

The resulting executable will generate mock outputs when run on the appropriate files (ignoring their contents).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

static create(path, exit_code=0)

Create a “mock” bowtie-build executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock bowtie-build functionality

class auto_process_ngs.mock.MockCellrangerExe(path, exit_code=0, platform=None, assert_bases_mask=None, assert_include_introns=None, assert_chemistry=None, assert_force_cells=None, assert_filter_single_index=None, assert_filter_dual_index=None, assert_rc_i2_override=None, reads=None, multiome_data=None, multi_outputs=None, version=None)

Wrapper for Mock10xPackageExe

Maintained for backwards-compatibility

class auto_process_ngs.mock.MockConda

Create mock conda installation

This class can be used to create a mock conda installation consisting of:

  • bin subdirectory with mock conda executable and ‘activate’ script

  • envs subdirectory

This can be used in place of an actual conda installation for testing purposes.

To create a mock installation, use the ‘create’ static method, e.g.

>>> MockCondaExe.create("/tmpbin/conda")

The resulting conda executable supports --version and the create command, and will generate mock outputs for both.

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

  • the ‘create’ command can be forced to fail for all inputs by setting the create_fails argument

  • the reported version can be set via the version argument

static create(path, version='4.10.3', create_fails=False, activate_fails=False, exit_code=0)

Create a “mock” fastq_strand.py executable

Parameters:
  • path (str) – path to the top-level directory for the mock conda installation (which must not exist, however the directory it will be created in must be present).

  • version (str) – version that mock conda will claim to be

  • create_fails (bool) – if True then the ‘create’ subcommand of the mock conda executable will fail.

  • activate_fails (bool) – if True then the ‘activate’ script in the mock conda installation will return with value 1 (i.e. an error code).

  • exit_code (int) – exit code that the mock executable should complete with

class auto_process_ngs.mock.MockFastQC(version=None, no_outputs=False, exit_code=0)

Create mock fastqc

This class can be used to create a mock fastqc executable, which in turn can be used in place of the actual fastqc program for testing purposes.

To create a mock script, use the ‘create’ static method, e.g.

>>> MockFastQC.create("/tmpbin/fastqc")

The resulting executable will generate mock outputs when run on Fastq files (ignoring their content).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

static create(path, version=None, no_outputs=False, exit_code=0)

Create a “mock” illumina.sh “script”

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must.

  • version (str) – explicit version string

  • no_outputs (bool) – if True then make don’t create mock outputs for FastQC

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock fastqc functionality

class auto_process_ngs.mock.MockFastqScreen(version=None, no_outputs=False, exit_code=0)

Create mock fastq_screen

This class can be used to create a mock fastq_screen executable, which in turn can be used in place of the actual fastq_screen program for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockFastqScreen.create("/tmpbin/fastq_screen")

The resulting executable will generate mock outputs when run on a Fastq file (ignoring its content).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

  • outputs for specific stages can be removed by specifying their names in the missing_fastqs argument

static create(path, version=None, no_outputs=False, exit_code=0)

Create a “mock” fastq_screen executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must.

  • version (str) – explicit version string

  • no_outputs (bool) – if True then don’t create outputs (default: False, do create outputs)

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock fastq_screen functionality

class auto_process_ngs.mock.MockFastqStrandPy(no_outputs=False, exit_code=0)

Create mock fastq_strand.py executable

This class can be used to create a mock fastq_strand.py executable, which in turn can be used in place of the actual fastq_strand.py executable for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockFastqStrandPy.create("/tmpbin/fastq_strand.py")

The resulting executable will generate mock outputs when run on a pair of Fastq files (ignoring their contents).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

  • the outputs can be suppressed by setting the no_output argument to True

static create(path, no_outputs=False, exit_code=0)

Create a “mock” fastq_strand.py executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must.

  • no_outputs (bool) – if True then don’t create any of the expected outputs

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock fastq_strand.py functionality

class auto_process_ngs.mock.MockGtf2bed(version=None, no_outputs=False, exit_code=0)

Create mock ‘gtf2bed’ (from bedops)

This class can be used to create a mock gtf2bed executable, which in turn can be used in place of the actual gtf2bed program for testing purposes.

To create a mock script, use the ‘create’ static method, e.g.

>>> MockGtf2bed.create("/tmpbin/gtf2bed")

The resulting executable will generate mock outputs when run on GTF files (ignoring their content).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

The following flags can also be used:

  • no_outputs configures the mock executable not to write any output

static create(path, version=None, no_outputs=False, exit_code=0)

Create a mock ‘gtf2bed’ utility

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must.

  • version (str) – explicit version string

  • no_outputs (bool) – if True then make don’t create mock outputs

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock gtf2bed functionality

class auto_process_ngs.mock.MockMultiQC(version=None, no_outputs=False, exit_code=0)

Create mock MultiQC executable

This class can be used to create a mock multiqc executable, which in turn can be used in place of the actual multiqc executable for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockMultiQC.create("/tmpbin/multiqc")

The resulting executable will generate mock outputs when run on a directory (ignoring its contents).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

  • the outputs can be suppressed by setting the no_output argument to True

static create(path, version=None, no_outputs=False, exit_code=0)

Create a “mock” multiqc executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must.

  • version (str) – explicit version string

  • no_outputs (bool) – if True then don’t create any of the expected outputs

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock multiqc functionality

class auto_process_ngs.mock.MockPicard(path, exit_code=0)

Create mock Picard tools

This class can be used to create a mock Picard tools executable, which in turn can be used in place of an actual executable for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockPicard.create("/tmpbin/picard")

The resulting executable will generate mock outputs when run on a directory (ignoring its contents).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

static create(path, exit_code=0)

Create a “mock” Picard executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock Picard tools functionality

class auto_process_ngs.mock.MockQualimap(path, exit_code=0)

Create mock Qualimap

This class can be used to create a mock Qualimap executable, which in turn can be used in place of an actual executable for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockQualimap.create("/tmpbin/qualimap")

The resulting executable will generate mock outputs when run on a directory (ignoring its contents).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

static create(path, exit_code=0)

Create a “mock” Qualimap executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock Qualimap functionality

class auto_process_ngs.mock.MockRSeQC(path, exit_code=0)

Create mock RSeQC components

This class can be used to create a mock RSeQC component (e.g. infer_experiment.py), which in turn can be used in place of an actual executable for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockRSeQC.create("/tmpbin/infer_experiment.py")

The resulting executable will generate mock outputs when run on a directory (ignoring its contents).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

static create(path, exit_code=0)

Create a “mock” RSeQC component executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock RSeQC functionality

class auto_process_ngs.mock.MockSamtools(path, exit_code=0)

Create mock samtools installation

This class can be used to create a mock samtools executable, which in turn can be used in place of the actual samtools package, for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockSamtools.create("/tmpbin/samtools")

The resulting executable will generate mock outputs according to the supplied command line.

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

static create(path, exit_code=0)

Create a “mock” samtools executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must.

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock samtools functionality

class auto_process_ngs.mock.MockSeqtk(version=None, no_outputs=False, exit_code=0)

Create mock ‘seqtk’

This class can be used to create a mock seqtk executable, which in turn can be used in place of the actual seqtk program for testing purposes.

To create a mock script, use the ‘create’ static method, e.g.

>>> MockGtf2bed.create("/tmpbin/seqtk")

The resulting executable will generate mock outputs when run on Fastq files (ignoring their content).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

The following flags can also be used:

  • no_outputs configures the mock executable not to write any output

static create(path, version=None, no_outputs=False, exit_code=0)

Create a mock ‘seqtk’ utility

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must.

  • version (str) – explicit version string

  • no_outputs (bool) – if True then make don’t create mock outputs

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock seqtk functionality

class auto_process_ngs.mock.MockStar(path, version=None, unmapped_output=False, no_outputs=False, exit_code=0)

Create mock STAR executable

This class can be used to create a mock STAR executable, which in turn can be used in place of the actual STAR executable for testing purposes.

To create a mock executable, use the ‘create’ static method, e.g.

>>> MockSTAR.create("/tmpbin/star")

The resulting executable will generate mock outputs when run on a directory (ignoring its contents).

The executable can be configured on creation to produce different error conditions when run:

  • the exit code can be set to an arbitrary value via the exit_code argument

  • mock unmapped output can be produced via the unmapped_output argument

  • the outputs can be suppressed by setting the no_output argument to True

static create(path, version=None, unmapped_output=False, no_outputs=False, exit_code=0)

Create a “mock” star executable

Parameters:
  • path (str) – path to the new executable to create. The final executable must not exist, however the directory it will be created in must.

  • version (str) – explicit version string

  • unmapped_output (bool) – if True then produce mock “unmapped” output

  • no_outputs (bool) – if True then don’t create any of the expected outputs

  • exit_code (int) – exit code that the mock executable should complete with

main(args)

Internal: provides mock STAR functionality

class auto_process_ngs.mock.UpdateAnalysisDir(ap)

Utility class to add mock artefacts to an AnalysisDir

Provides the following methods:

  • add_processing_report

  • add_barcode_analysis

  • add_10x_mkfastq_qc_output

  • add_cellranger_qc_output

Example usage:

>>> m = MockAnalysisDirFactory.bcl2fastq2(
...     '160621_M00879_0087_000000000-AGEW9',
...     'miseq')
>>> m.create()
>>> ap = AutoProcess(m.dirn)
>>> UpdateAnalysisDir(ap).add_processing_report()
add_10x_mkfastq_qc_output(pkg, lanes=None)

Add mock 10xGenomics mkfastq QC report

Parameters:
  • pkg (str) – 10xGenomics package (e.g. ‘cellranger’, ‘cellranger-atac’ etc)

  • lanes (str) – optional, specify lane numbers for the report

add_barcode_analysis(barcode_analysis_dir='barcode_analysis')

Add mock barcode analysis outputs

Parameters:

barcode_analysis_dir (str) – name of barcode analysis subdirectory (default: ‘barcode_analysis’

add_cellranger_qc_output(lanes=None)

Add mock cellranger QC report

Parameters:

lanes (str) – optional, specify lane numbers for the report

add_processing_report(name='processing_qc.html')

Add a ‘processing_qc.html’ file

Parameters:

name (str) – optionally, specify a non-standard report name (default: ‘processing_qc.html’)

class auto_process_ngs.mock.UpdateAnalysisProject(project)

Utility class to add mock artefacts to an AnalysisDir

Provides the following methods:

  • add_fastq_set

  • add_qc_outputs

  • add_icell8_outputs

  • add_cellranger_count_outputs

  • add_cellranger_multi_outputs

Example usage:

>>> m = MockAnalysisProject("PJB",('PJB1_S1_R1_001.fasta.gz,
...                                'PJB1_S1_R2_001.fasta.gz))
>>> m.create()
>>> p = AnalysisProject(m.name,m.name)
>>> UpdateAnalysisProject(p).add_qc_outputs()
add_cellranger_count_outputs(qc_dir=None, cellranger='cellranger', reference_data_path='/data/refdata-cellranger-1.2.0', prefix='cellranger_count', legacy=False)

Add mock ‘cellranger count’ outputs to project

Parameters:
  • qc_dir (str) – specify non-default QC output directory

  • cellranger (str) – specify the 10xGenomics software package to add outputs for (defaults to ‘cellranger’; alternatives are ‘cellranger-atac’)

  • reference_data_path (str) – optionally specify path for reference dataset (doesn’t need to exist)

  • prefix (str) – leading subdirectory for cellranger count outputs (defaults to ‘cellranger_count’; ignored if ‘legacy’ style outputs are generated)

  • legacy (bool) – if True then generate ‘legacy’ style cellranger outputs

add_cellranger_multi_outputs(config_csv=None, sample_names=None, reference_data_path=None, qc_dir=None, prefix='cellranger_multi')

Add mock ‘cellranger multi’ outputs to project

If a 10x multiplexing config file is supplied then the mock outputs are generated using the data within that file; otherwise the sample names and reference dataset path should be explicitly supplied.

Parameters:
  • config_csv (str) – path to a 10x multiplexing config file (if supplied then sample names and reference dataset path will be taken from this file)

  • sample_names (list) – optionally specify list of multiplexed sample names (ignored if ‘config_csv’ file is supplied)

  • reference_data_path (str) – optionally specify path to reference dataset (doesn’t need to exist; ignored if ‘config_csv’ file is supplied)

  • qc_dir (str) – specify non-default QC output directory

  • prefix (str) – leading subdirectory for cellranger count outputs (defaults to ‘cellranger_multi’; ignored if ‘legacy’ style outputs are generated)

add_fastq_set(fastq_set, fastqs)

Add an additional fastq set

Parameters:
  • fastq_set (str) – name of the new Fastq set/subdirectory

  • fastqs (list) – list of Fastq filenames to create in the new set

add_icell8_outputs()

Add mock ICell8 outputs to the project

add_qc_outputs(fastq_set=None, qc_dir=None, protocol='standardPE', include_fastq_strand=True, include_seqlens=True, include_multiqc=True, include_report=True, include_zip_file=True, legacy_screens=False, legacy_zip_name=False)

Add mock QC outputs

Parameters:
  • fastq_set (str) – specify non-default Fastq set to make QC outputs

  • qc_dir (str) – specify non-default QC output directory

  • protocol (str) – specify non-default QC protocol to use

  • include_fastq_strand (bool) – if True then add mock fastq_strand.py outputs

  • include_seqlens (bool) – if True then add mock sequence length outputs

  • include_multiqc (bool) – if True then add mock MultiQC outputs

  • include_report (bool) – if True then add mock QC report outputs

  • include_zip_file (bool) – if True then add mock ZIP archive for QC report

  • legacy_screens (bool) – if True then use old-style ‘illumina_qc.sh’ naming conventions for FastqScreen outputs

  • legacy_zip_name (bool) – if True then use old-style naming convention for ZIP file with QC outputs

auto_process_ngs.mock.make_mock_analysis_project(name='PJB', top_dir=None, protocol=None, paired_end=True, qc_dir='qc', fastq_dir='fastqs', fastq_names=None, sample_names=None, seq_data_samples=None, screens=('model_organisms', 'other_organisms', 'rRNA'), include_fastqc=True, include_fastq_screen=True, include_strandedness=True, include_seqlens=True, include_rseqc_infer_experiment=False, include_rseqc_genebody_coverage=False, include_picard_insert_size_metrics=False, include_qualimap_rnaseq=False, include_multiqc=True, include_cellranger_count=False, include_cellranger_multi=False, cellranger_pipelines=('cellranger',), cellranger_samples=None, cellranger_multi_samples=None, cellranger_version=None, legacy_screens=False, legacy_cellranger_outs=False)

Create a mock Analysis Project directory with QC artefacts

Parameters:
  • name (str) – name for the mock project

  • top_dir (str) – path to the directory to create the mock project directory under

  • protocol (str) – QC protocol to emulate

  • paired_end (bool) – whether the mock project should be paired-end (the default)

  • fastq_dir (str) – optional, set a non-standard directory for the Fastq files

  • fastq_names (list) – optional, explicit list of Fastq names

  • sample_names (list) – optional, explicit list of sample names

  • seq_data_samples (list) – list with subset of sample names which include sequence (i.e. biological) data

  • screens (list) – optional, list of non-standard FastqScreen panel names

  • include_fastqc (bool) – include outputs from Fastqc

  • include_fastq_screen (bool) – include outputs from FastqScreen

  • include_strandedness (bool) – include outputs from strandedness

  • include_seqlens (bool) – include sequence length metrics

  • include_rseqc_infer_experiment (bool) – include RSeQC infer_experiment.py outputs

  • include_rseqc_genebody_coverage (bool) – include RSeQC geneBody_coverage.py outputs

  • include_picard_insert_size_metrics (bool) – include Picard CollectInsertSizeMetrics outputs

  • include_qualimap_rnaseq (bool) – include Qualimap rnaseq outputs

  • include_multiqc (bool) – include MultiQC outputs

  • include_celllranger_count (bool) – include ‘cellranger count’ outputs

  • include_cellranger_multi (bool) – include ‘cellranger multi’ outputs

  • cellranger_pipelines (list) – list of 10xGenomics pipelines to make mock outputs for (e.g. ‘cellranger’, ‘cellranger-atac’ etc)

  • cellranger_samples (list) – list of sample names to produce ‘cellranger count’ outputs for

  • cellranger_multi_samples (list) – list of sample names to produce ‘cellranger multi’ outputs for

  • cellranger_version (str) – if set then specifies version of Cellranger to mimick

  • legacy_screens (bool) – if True then use legacy naming convention for FastqScreen outputs

  • legacy_cellranger_outs (bool) – if True then use legacy naming convention for 10xGenomics pipeline outputs

Returns:

path to the mock analysis project that was created.

Return type:

String

auto_process_ngs.mock.make_mock_bcl2fastq2_output(out_dir, lanes, sample_sheet=None, reads=None, no_lane_splitting=False, exclude_fastqs=None, create_fastq_for_index_read=False, paired_end=False, force_sample_dir=False)

Creates files & directories structure mimicking output from bcl2fastq2

Parameters:
  • out_dir (str) – path to output directory

  • lanes (iterable) – list of lanes to create output for

  • sample_sheet (str) – path to sample sheet file

  • reads (iterable) – list of ‘reads’ to create (e.g. (‘R1’,’R2’); defaults to (‘R1’) if not specified

  • no_lane_splitting (bool) – whether to produce mock Fastq files for each lane, or combine them across lanes (mimics the –no-lane-splitting option in bcl2fastq)

  • exclude_fastqs (iterable) – specifies a list of Fastq files to exclude from the outputs

  • create_fastq_for_index_read (bool) – whether to also include ‘I1’ etc Fastqs for index reads (ignored if ‘reads’ argument is set)

  • paired_end (bool) – whether to also include ‘R2’ and ‘I2’ Fastqs (ignored if ‘reads’ argument is set)

  • force_sample_dir (bool) – whether to force insertion of a ‘sample name’ directory for IEM4 sample sheets where sample name and ID are the same