auto_process_ngs.cli.run_qc

Runs the QC pipeline standalone on an arbitrary set of Fastq files

class auto_process_ngs.cli.run_qc.InfoAction(option_strings, settings, nargs=None, *args, **kws)

Custom parser action for the –info option

Example usage:

>>> p.add_argument('--info',action=InfoAction,settings=settings)

where ‘settings’ should be a populated ‘Settings’ instance.

When invoked the action will display information on protocols, organisms and other configuration settings, and then exit.

auto_process_ngs.cli.run_qc.add_10x_options(p)

Cellranger/10x Genomics options

auto_process_ngs.cli.run_qc.add_advanced_options(p, use_legacy_screen_names, shorten_zip_paths)

Advanced options

auto_process_ngs.cli.run_qc.add_conda_options(p, enable_conda, conda_env_dir)

Conda options

auto_process_ngs.cli.run_qc.add_custom_protocol_options(p)

Options for defining custom protocols

auto_process_ngs.cli.run_qc.add_debug_options(p)

Debugging options

auto_process_ngs.cli.run_qc.add_deprecated_options(p)

Deprecated options

auto_process_ngs.cli.run_qc.add_job_control_options(p, max_cores, max_jobs, max_batches)

Job control options

auto_process_ngs.cli.run_qc.add_metadata_options(p)

Metadata options

auto_process_ngs.cli.run_qc.add_pipeline_options(p, fastq_subset_size, default_nthreads)

QC pipeline options

auto_process_ngs.cli.run_qc.add_reference_data_options(p)

Reference data options

auto_process_ngs.cli.run_qc.add_reporting_options(p)

Reporting options

auto_process_ngs.cli.run_qc.announce(title)

Print arbitrary string as a title

Prints the supplied string as a title, e.g.

>>> announce("Hello!")
... ======
... Hello!
... ======
Parameters:

title (str) – string to print

Returns:

None

auto_process_ngs.cli.run_qc.build_10x_multi_config(multi_config_file, fastq_dir, libraries, samples, gex_reference=None, probe_set=None, vdj_reference=None)

Constructs a ‘cellranger multi’ configuration file

Parameters:
  • multi_config_file (str) – path to the output config file

  • fastq_dir (str) – path to the directory holding the Fastq files

  • libraries (dict) – dictionary where keys are Fastq IDs and values are the corresponding 10x library types

  • samples (dict) – dictionary where keys are multiplexed sample names and values are the corresponding 10x CMO or probe IDs

  • gex_reference (str) – path to the gene expression reference dataset to use (no ‘gene-expression’ section will be written if not supplied)

  • probe_set (str) – path to the probe set reference dataset (no ‘probe-set’ setting will be written if not supplied)

  • vdj_reference (str) – path to the VDJ reference dataset to use (no ‘vdj’ section will be written if not supplied)

auto_process_ngs.cli.run_qc.cleanup_atexit(tmp_project_dir)

Perform clean up actions on exit

Removes the temporary project directory created for running the QC

auto_process_ngs.cli.run_qc.display_info(s)

Displays information about the current configuration

The information includes the available QC protocols, organisms and FastqScreen conf files.

Parameters:

s (Settings) – populated Settings instance

auto_process_ngs.cli.run_qc.get_applications(tags)

Return a list of applications matching tags

Parameters:

tags (list) – list of tags to match when filtering applications

Returns:

Dictionary of where keys are platforms and values are lists of associated libraries.

auto_process_ngs.cli.run_qc.get_execution_environment()

Fetch information on the local execution environment

Interrogates the local system to get information on number of cores, memory etc.

It returns a dictionary-like object with the following elements:

  • ‘cpu_count’: total number of CPUs

  • ‘total_mem’: total amount of memory (Gb)

  • ‘nslots’: value of the ‘NSLOTS’ env variable

  • ‘max_cores’: maximum available cores

  • ‘max_mem’: maximum available memory (Gb)

  • ‘mem_per_core’: memory per core (Gb)

Available cores is the number of CPUs, or (for compute cluster nodes) the number of available CPUs assigned (obtained via the value of the appropriate environment variable e.g. ‘NSLOTS’ for Grid Engine, ‘SLURM_NTASKS” for Slurm).

Available memory is the proportion of total memory scaled by the number of available cores. Memory per core is the total memory divided by the total number of CPUs.

Returns:

elements are ‘cpu_count’,

’total_mem’, ‘nslots’, ‘max_cores’, ‘max_mem’ and ‘mem_per_core’

Return type:

AttributeDictionary

auto_process_ngs.cli.run_qc.main(argv=None)

Run the ‘run_qc’ utility

Parameters:

argv (list) – optional, command line arguments to process (otherwise take arguments from ‘sys.argv’)

Returns:

0 on success, 1 on failure.

Return type:

Integer

auto_process_ngs.cli.run_qc.process_inputs(input_list)

Process the inputs and return Fastqs etc

The inputs can be one of:

  • a subdirectory in a project

  • a project directory

  • a non-project directory with Fastqs

  • a ‘raw’ list of Fastqs

The function attempts to determine which type the inputs are, generate a list of Fastq files, and locate any related filesystem objects (for example a “parent” project directory).

It returns a dictionary-like object with the following elements:

  • ‘fastqs’: a list of Fastq files

  • ‘dir_path’: the directory supplied as an input, if any

  • ‘info_file’: path to an AnalysisProject metadata file

  • ‘extra_files’: any additional QC-related config files

Returns:

elements are ‘fastqs’, ‘dir_path’,

’info_file’ and ‘extra_files’

Return type:

AttributeDictionary