auto_process_ngs.tenx.utils
Utility functions for processing the outputs from 10x Genomics pipelines:
flow_cell_id
has_10x_indices
has_chromium_sc_indices
cellranger_info
spaceranger_info
make_qc_summary_html
add_cellranger_args
make_multi_config_template
- auto_process_ngs.tenx.utils.add_cellranger_args(cellranger_cmd, jobmode=None, maxjobs=None, mempercore=None, jobinterval=None, localcores=None, localmem=None, disable_ui=False)
Configure options for cellranger
Given a Command instance for running cellranger, add the appropriate options (e.g. –jobmode) according to the supplied arguments.
- Parameters:
cellranger_cmd (Command) – Command instance for running cellranger
jobmode (str) – if specified, will be passed to the –jobmode option
maxjobs (int) – if specified, will be passed to the –mempercore option
mempercore (int) – if specified, will be passed to the –maxjobs option (only if jobmode is not “local”)
jobinterval (int) – if specified, will be passed to the –jobinterval option
localcores (int) – if specified, will be passed to the –localcores option (only if jobmode is “local”)
localmem (int) – if specified, will be passed to the the –localmem option (only if jobmode is “local”)
disable_ui (bool) – if True, add the –disable-ui option (default is not to add it)
- Returns:
- the original command updated with the
appropriate options.
- Return type:
- auto_process_ngs.tenx.utils.cellranger_info(path=None, name=None)
Retrieve information on the cellranger software
If called without any arguments this will locate the first cellranger executable that is available on the user’s PATH, and attempts to extract the version.
Alternatively if the path to an executable is supplied then the version will be determined from that instead.
If no version is identified then the script path is still returned, but without any version info.
If a ‘path’ is supplied then the package name will be taken from the basename; otherwise the package name can be supplied via the ‘name’ argument. If neither are supplied then the package name defaults to ‘cellranger’.
- Returns:
- tuple consisting of (PATH,PACKAGE,VERSION) where PATH
is the full path for the cellranger program, PACKAGE is ‘cellranger’, and VERSION is the package version. If any value can’t be determined then it will be returned as an empty string.
- Return type:
Tuple
- auto_process_ngs.tenx.utils.flow_cell_id(run_name)
Extract the flow cell ID from a run name
For example for run name “170426_K00311_0033_AHJCY7BBXX” the extracted flow cell ID will be “HJCY7BBXX”.
- Parameters:
run_name (str) – path to the run name to extract flow cell ID from
- Returns:
the extracted flow cell ID.
- Return type:
String
- auto_process_ngs.tenx.utils.has_10x_indices(sample_sheet)
Check if a sample sheet contains 10xGenomics-format indices
The Chromium SC 3’v2 indices are of the form:
SI-GA-[A-H][1-12]
e.g. ‘SI-GA-B11’ (see https://support.10xgenomics.com/permalink/27rGqWvNYYuqkgeS66sksm)
For scATAC-seq the indices are assumed to be of the form:
SI-NA-[A-H][1-12]
e.g. ‘SI-NA-G9’
For Visium data the indices are assumed to be of the form:
SI-(TT|TS)-[A-H][1-12]
e.g. ‘SI-TT-B1’
- Parameters:
sample_sheet (str) – path to the sample sheet CSV file to check
- Returns:
- True if the sample sheet contains at least
one 10xGenomics-style index, False if not.
- Return type:
Boolean
- auto_process_ngs.tenx.utils.has_chromium_sc_indices(sample_sheet)
Wrapper for ‘has_10x_indices’.
Maintained for backwards compatibility
- auto_process_ngs.tenx.utils.make_multi_config_template(f, reference=None, fastq_dir=None, samples=None, multiplexing=None, extensions=None, no_bam=None, include_probe_set=None, probe_set=None, cellranger_version=None)
Write a template configuration file for ‘cellranger multi’
Generates a template for the ‘cellranger multi’ configuration file. Specific options and sections are included or omitted depending on the arguments supplied to this function.
The format and parameters for different data types are described in the 10x Genomics ‘cellranger’ documentation:
- Parameters:
f (str) – path that output template file will be written to
reference (str) – path to reference transcriptome
fastq_dir (str) – path to directory with Fastq files
samples (list) – list of sample names
multiplexing (str) – type of multiplexing (one of ‘cellplex’, ‘flex’ or ‘ocm’, or None for singleplex data)
extensions (list) – list of “product extensions” (one or more of ‘CSP’, ‘VDJ-T’, ‘VDJ-B’) or None if there are no extensions
no_bam (bool) – if set then will be the value of the ‘no-bam’/’create-bam’ setting (depending on the target CellRanger version)
include_probe_set (bool) – if set then the ‘probe-set’ setting will be included in the template (defaults to False; if ‘probe_set’ is defined then will be set to True automatically)
probe_set (str) – path to probe set CSV file
cellranger_version (str) – optionally specify the target CellRanger version number (or None to use the default version)
- auto_process_ngs.tenx.utils.make_qc_summary_html(json_file, html_file)
Make HTML report for cellranger mkfastqs processing stats
- Parameters:
json_file (str) – path to JSON file output from cellranger mkfastq command
html_file (str) – path to output HTML file
- auto_process_ngs.tenx.utils.spaceranger_info(path=None, name=None)
Retrieve information on the spaceranger software
If called without any arguments this will locate the first spaceranger executable that is available on the user’s PATH, and attempts to extract the version.
Alternatively if the path to an executable is supplied then the version will be determined from that instead.
If no version is identified then the script path is still returned, but without any version info.
If a ‘path’ is supplied then the package name will be taken from the basename; otherwise the package name can be supplied via the ‘name’ argument. If neither are supplied then the package name defaults to ‘cellranger’.
- Returns:
- tuple consisting of (PATH,PACKAGE,VERSION) where PATH
is the full path for the spaceranger program, PACKAGE is ‘spaceranger’, and VERSION is the package version. If any value can’t be determined then it will be returned as an empty string.
- Return type:
Tuple