auto_process_ngs.cli.run_qc
Runs the QC pipeline standalone on an arbitrary set of Fastq files
- class auto_process_ngs.cli.run_qc.InfoAction(option_strings, settings, nargs=None, *args, **kws)
Custom parser action for the –info option
Example usage:
>>> p.add_argument('--info',action=InfoAction,settings=settings)
where ‘settings’ should be a populated ‘Settings’ instance.
When invoked the action will display information on protocols, organisms and other configuration settings, and then exit.
- auto_process_ngs.cli.run_qc.add_10x_options(p)
Cellranger/10x Genomics options
- auto_process_ngs.cli.run_qc.add_advanced_options(p, use_legacy_screen_names, shorten_zip_paths)
Advanced options
- auto_process_ngs.cli.run_qc.add_conda_options(p, enable_conda, conda_env_dir)
Conda options
- auto_process_ngs.cli.run_qc.add_custom_protocol_options(p)
Options for defining custom protocols
- auto_process_ngs.cli.run_qc.add_debug_options(p)
Debugging options
- auto_process_ngs.cli.run_qc.add_deprecated_options(p)
Deprecated options
- auto_process_ngs.cli.run_qc.add_job_control_options(p, max_cores, max_jobs, max_batches)
Job control options
- auto_process_ngs.cli.run_qc.add_metadata_options(p)
Metadata options
- auto_process_ngs.cli.run_qc.add_pipeline_options(p, fastq_subset_size, default_nthreads)
QC pipeline options
- auto_process_ngs.cli.run_qc.add_reference_data_options(p)
Reference data options
- auto_process_ngs.cli.run_qc.add_reporting_options(p)
Reporting options
- auto_process_ngs.cli.run_qc.announce(title)
Print arbitrary string as a title
Prints the supplied string as a title, e.g.
>>> announce("Hello!") ... ====== ... Hello! ... ======- Parameters:
title (str) – string to print
- Returns:
None
- auto_process_ngs.cli.run_qc.build_10x_multi_config(multi_config_file, fastq_dir, libraries, samples, gex_reference=None, probe_set=None, vdj_reference=None)
Constructs a ‘cellranger multi’ configuration file
- Parameters:
multi_config_file (str) – path to the output config file
fastq_dir (str) – path to the directory holding the Fastq files
libraries (dict) – dictionary where keys are Fastq IDs and values are the corresponding 10x library types
samples (dict) – dictionary where keys are multiplexed sample names and values are the corresponding 10x CMO or probe IDs
gex_reference (str) – path to the gene expression reference dataset to use (no ‘gene-expression’ section will be written if not supplied)
probe_set (str) – path to the probe set reference dataset (no ‘probe-set’ setting will be written if not supplied)
vdj_reference (str) – path to the VDJ reference dataset to use (no ‘vdj’ section will be written if not supplied)
- auto_process_ngs.cli.run_qc.cleanup_atexit(tmp_project_dir)
Perform clean up actions on exit
Removes the temporary project directory created for running the QC
- auto_process_ngs.cli.run_qc.display_info(s)
Displays information about the current configuration
The information includes the available QC protocols, organisms and FastqScreen conf files.
- Parameters:
s (Settings) – populated Settings instance
- auto_process_ngs.cli.run_qc.get_applications(tags)
Return a list of applications matching tags
- Parameters:
tags (list) – list of tags to match when filtering applications
- Returns:
Dictionary of where keys are platforms and values are lists of associated libraries.
- auto_process_ngs.cli.run_qc.get_execution_environment()
Fetch information on the local execution environment
Interrogates the local system to get information on number of cores, memory etc.
It returns a dictionary-like object with the following elements:
‘cpu_count’: total number of CPUs
‘total_mem’: total amount of memory (Gb)
‘nslots’: value of the ‘NSLOTS’ env variable
‘max_cores’: maximum available cores
‘max_mem’: maximum available memory (Gb)
‘mem_per_core’: memory per core (Gb)
Available cores is the number of CPUs, or (for compute cluster nodes) the number of available CPUs assigned (obtained via the value of the appropriate environment variable e.g. ‘NSLOTS’ for Grid Engine, ‘SLURM_NTASKS” for Slurm).
Available memory is the proportion of total memory scaled by the number of available cores. Memory per core is the total memory divided by the total number of CPUs.
- Returns:
- elements are ‘cpu_count’,
’total_mem’, ‘nslots’, ‘max_cores’, ‘max_mem’ and ‘mem_per_core’
- Return type:
AttributeDictionary
- auto_process_ngs.cli.run_qc.main(argv=None)
Run the ‘run_qc’ utility
- Parameters:
argv (list) – optional, command line arguments to process (otherwise take arguments from ‘sys.argv’)
- Returns:
0 on success, 1 on failure.
- Return type:
Integer
- auto_process_ngs.cli.run_qc.process_inputs(input_list)
Process the inputs and return Fastqs etc
The inputs can be one of:
a subdirectory in a project
a project directory
a non-project directory with Fastqs
a ‘raw’ list of Fastqs
The function attempts to determine which type the inputs are, generate a list of Fastq files, and locate any related filesystem objects (for example a “parent” project directory).
It returns a dictionary-like object with the following elements:
‘fastqs’: a list of Fastq files
‘dir_path’: the directory supplied as an input, if any
‘info_file’: path to an AnalysisProject metadata file
‘extra_files’: any additional QC-related config files
- Returns:
- elements are ‘fastqs’, ‘dir_path’,
’info_file’ and ‘extra_files’
- Return type:
AttributeDictionary