auto_process_ngs.tenx.utils
Utility functions for processing the outputs from 10x Genomics pipelines:
flow_cell_id
has_10x_indices
has_chromium_sc_indices
get_bases_mask_10x_atac
get_bases_mask_10x_multiome
cellranger_info
spaceranger_info
make_qc_summary_html
add_cellranger_args
make_multi_config_template
- auto_process_ngs.tenx.utils.add_cellranger_args(cellranger_cmd, jobmode=None, maxjobs=None, mempercore=None, jobinterval=None, localcores=None, localmem=None, disable_ui=False)
Configure options for cellranger
Given a Command instance for running cellranger, add the appropriate options (e.g. –jobmode) according to the supplied arguments.
- Parameters:
cellranger_cmd (Command) – Command instance for running cellranger
jobmode (str) – if specified, will be passed to the –jobmode option
maxjobs (int) – if specified, will be passed to the –mempercore option
mempercore (int) – if specified, will be passed to the –maxjobs option (only if jobmode is not “local”)
jobinterval (int) – if specified, will be passed to the –jobinterval option
localcores (int) – if specified, will be passed to the –localcores option (only if jobmode is “local”)
localmem (int) – if specified, will be passed to the the –localmem option (only if jobmode is “local”)
disable_ui (bool) – if True, add the –disable-ui option (default is not to add it)
- Returns:
- the original command updated with the
appropriate options.
- Return type:
- auto_process_ngs.tenx.utils.cellranger_info(path=None, name=None)
Retrieve information on the cellranger software
If called without any arguments this will locate the first cellranger executable that is available on the user’s PATH, and attempts to extract the version.
Alternatively if the path to an executable is supplied then the version will be determined from that instead.
If no version is identified then the script path is still returned, but without any version info.
If a ‘path’ is supplied then the package name will be taken from the basename; otherwise the package name can be supplied via the ‘name’ argument. If neither are supplied then the package name defaults to ‘cellranger’.
- Returns:
- tuple consisting of (PATH,PACKAGE,VERSION) where PATH
is the full path for the cellranger program, PACKAGE is ‘cellranger’, and VERSION is the package version. If any value can’t be determined then it will be returned as an empty string.
- Return type:
Tuple
- auto_process_ngs.tenx.utils.flow_cell_id(run_name)
Extract the flow cell ID from a run name
For example for run name “170426_K00311_0033_AHJCY7BBXX” the extracted flow cell ID will be “HJCY7BBXX”.
- Parameters:
run_name (str) – path to the run name to extract flow cell ID from
- Returns:
the extracted flow cell ID.
- Return type:
String
- auto_process_ngs.tenx.utils.get_bases_mask_10x_atac(runinfo_xml)
Acquire a bases mask for 10xGenomics scATAC-seq
Generates an initial bases mask based on the run contents, and then updates this so that:
Only the first 8 bases of the first index read are actually used, and
The second index read is converted to a data read.
For example: if the initial bases mask is ‘Y50,I16,I16,Y50’ then the scATAC-seq bases mask will be ‘Y50,I8nnnnnnnn,Y16,Y50’.
- Parameters:
runinfo_xml (str) – path to the RunInfo.xml for the sequencing run
- Returns:
10xGenomics scATAC-seq bases mask string
- Return type:
String
- auto_process_ngs.tenx.utils.get_bases_mask_10x_multiome(runinfo_xml, library)
Return bases mask for 10xGenomics single cell multiome
Generates an initial bases mask based on the run contents, and then updates this based on the library type (either ‘atac’ or ‘gex’).
For ATAC data: the template bases mask is “Y*,I8n*,Y24,Y*” (keeping all of read 1, first 8 bases of read 2, all 24 bases of read 3, and all of read 4).
For example: if the initial bases mask is ‘Y50,I10,Y24,Y90’ then the single cell multiome ATAC bases mask will be ‘Y50,I8n2,Y24,Y90’.
For GEX data: the template bases mask is “Y28n*,I10,I10n*,Y*” (keeping first 28 bases of read 1, all 10 bases of read 2, first 10 bases of read 3, and all of read 4).
For example: if the initial bases mask is ‘Y50,I10,Y24,Y90’ then the single cell multiome GEX bases mask will be ‘Y28n22,I10,I10n14,Y90’.
- Parameters:
runinfo_xml (str) – path to the RunInfo.xml for the sequencing run
library (str) – library type to set bases mask for (either ‘atac’ or ‘gex’)
- Returns:
- 10xGenomics single cell multiome bases mask
string for the specified library type.
- Return type:
String
- auto_process_ngs.tenx.utils.has_10x_indices(sample_sheet)
Check if a sample sheet contains 10xGenomics-format indices
The Chromium SC 3’v2 indices are of the form:
SI-GA-[A-H][1-12]
e.g. ‘SI-GA-B11’ (see https://support.10xgenomics.com/permalink/27rGqWvNYYuqkgeS66sksm)
For scATAC-seq the indices are assumed to be of the form:
SI-NA-[A-H][1-12]
e.g. ‘SI-NA-G9’
For Visium data the indices are assumed to be of the form:
SI-(TT|TS)-[A-H][1-12]
e.g. ‘SI-TT-B1’
- Parameters:
sample_sheet (str) – path to the sample sheet CSV file to check
- Returns:
- True if the sample sheet contains at least
one 10xGenomics-style index, False if not.
- Return type:
Boolean
- auto_process_ngs.tenx.utils.has_chromium_sc_indices(sample_sheet)
Wrapper for ‘has_10x_indices’.
Maintained for backwards compatibility
- auto_process_ngs.tenx.utils.make_multi_config_template(f, reference=None, probe_set=None, fastq_dir=None, samples=None, no_bam=None, library_type='CellPlex', cellranger_version=None)
Write a template configuration file for ‘cellranger multi’
Generates a template for the ‘cellranger multi’ configuration file, which can be used with either CellPlex or fixed RNA profiling (Flex) data.
The format and parameters for different data types are described in the 10x Genomics ‘cellranger’ documentation:
- Parameters:
f (str) – path that output template file will be written to
reference (str) – path to reference transcriptome
probe_set (str) – path to probe set CSV file
fastq_dir (str) – path to directory with Fastq files
samples (list) – list of sample names
no_bam (bool) – if set then will be the value of the ‘no-bam’ setting
library_type (str) – specify the library type of data that the configuration file will be used with; should be one of ‘CellPlex[…]’ (the default), ‘Flex’ or ‘Single Cell Immune Profiling’
cellranger_version (str) – optionally specify the target Cellranger version number (or None)
- auto_process_ngs.tenx.utils.make_qc_summary_html(json_file, html_file)
Make HTML report for cellranger mkfastqs processing stats
- Parameters:
json_file (str) – path to JSON file output from cellranger mkfastq command
html_file (str) – path to output HTML file
- auto_process_ngs.tenx.utils.spaceranger_info(path=None, name=None)
Retrieve information on the spaceranger software
If called without any arguments this will locate the first spaceranger executable that is available on the user’s PATH, and attempts to extract the version.
Alternatively if the path to an executable is supplied then the version will be determined from that instead.
If no version is identified then the script path is still returned, but without any version info.
If a ‘path’ is supplied then the package name will be taken from the basename; otherwise the package name can be supplied via the ‘name’ argument. If neither are supplied then the package name defaults to ‘cellranger’.
- Returns:
- tuple consisting of (PATH,PACKAGE,VERSION) where PATH
is the full path for the spaceranger program, PACKAGE is ‘spaceranger’, and VERSION is the package version. If any value can’t be determined then it will be returned as an empty string.
- Return type:
Tuple