auto_process_ngs.tenx.utils

Utility functions for processing the outputs from 10x Genomics pipelines:

  • flow_cell_id

  • has_10x_indices

  • has_chromium_sc_indices

  • cellranger_info

  • spaceranger_info

  • make_qc_summary_html

  • add_cellranger_args

  • make_multi_config_template

auto_process_ngs.tenx.utils.add_cellranger_args(cellranger_cmd, jobmode=None, maxjobs=None, mempercore=None, jobinterval=None, localcores=None, localmem=None, disable_ui=False)

Configure options for cellranger

Given a Command instance for running cellranger, add the appropriate options (e.g. –jobmode) according to the supplied arguments.

Parameters:
  • cellranger_cmd (Command) – Command instance for running cellranger

  • jobmode (str) – if specified, will be passed to the –jobmode option

  • maxjobs (int) – if specified, will be passed to the –mempercore option

  • mempercore (int) – if specified, will be passed to the –maxjobs option (only if jobmode is not “local”)

  • jobinterval (int) – if specified, will be passed to the –jobinterval option

  • localcores (int) – if specified, will be passed to the –localcores option (only if jobmode is “local”)

  • localmem (int) – if specified, will be passed to the the –localmem option (only if jobmode is “local”)

  • disable_ui (bool) – if True, add the –disable-ui option (default is not to add it)

Returns:

the original command updated with the

appropriate options.

Return type:

Command

auto_process_ngs.tenx.utils.cellranger_info(path=None, name=None)

Retrieve information on the cellranger software

If called without any arguments this will locate the first cellranger executable that is available on the user’s PATH, and attempts to extract the version.

Alternatively if the path to an executable is supplied then the version will be determined from that instead.

If no version is identified then the script path is still returned, but without any version info.

If a ‘path’ is supplied then the package name will be taken from the basename; otherwise the package name can be supplied via the ‘name’ argument. If neither are supplied then the package name defaults to ‘cellranger’.

Returns:

tuple consisting of (PATH,PACKAGE,VERSION) where PATH

is the full path for the cellranger program, PACKAGE is ‘cellranger’, and VERSION is the package version. If any value can’t be determined then it will be returned as an empty string.

Return type:

Tuple

auto_process_ngs.tenx.utils.flow_cell_id(run_name)

Extract the flow cell ID from a run name

For example for run name “170426_K00311_0033_AHJCY7BBXX” the extracted flow cell ID will be “HJCY7BBXX”.

Parameters:

run_name (str) – path to the run name to extract flow cell ID from

Returns:

the extracted flow cell ID.

Return type:

String

auto_process_ngs.tenx.utils.has_10x_indices(sample_sheet)

Check if a sample sheet contains 10xGenomics-format indices

The Chromium SC 3’v2 indices are of the form:

SI-GA-[A-H][1-12]

e.g. ‘SI-GA-B11’ (see https://support.10xgenomics.com/permalink/27rGqWvNYYuqkgeS66sksm)

For scATAC-seq the indices are assumed to be of the form:

SI-NA-[A-H][1-12]

e.g. ‘SI-NA-G9’

For Visium data the indices are assumed to be of the form:

SI-(TT|TS)-[A-H][1-12]

e.g. ‘SI-TT-B1’

Parameters:

sample_sheet (str) – path to the sample sheet CSV file to check

Returns:

True if the sample sheet contains at least

one 10xGenomics-style index, False if not.

Return type:

Boolean

auto_process_ngs.tenx.utils.has_chromium_sc_indices(sample_sheet)

Wrapper for ‘has_10x_indices’.

Maintained for backwards compatibility

auto_process_ngs.tenx.utils.make_multi_config_template(f, reference=None, fastq_dir=None, samples=None, multiplexing=None, extensions=None, no_bam=None, include_probe_set=None, probe_set=None, cellranger_version=None)

Write a template configuration file for ‘cellranger multi’

Generates a template for the ‘cellranger multi’ configuration file. Specific options and sections are included or omitted depending on the arguments supplied to this function.

The format and parameters for different data types are described in the 10x Genomics ‘cellranger’ documentation:

Parameters:
  • f (str) – path that output template file will be written to

  • reference (str) – path to reference transcriptome

  • fastq_dir (str) – path to directory with Fastq files

  • samples (list) – list of sample names

  • multiplexing (str) – type of multiplexing (one of ‘cellplex’, ‘flex’ or ‘ocm’, or None for singleplex data)

  • extensions (list) – list of “product extensions” (one or more of ‘CSP’, ‘VDJ-T’, ‘VDJ-B’) or None if there are no extensions

  • no_bam (bool) – if set then will be the value of the ‘no-bam’/’create-bam’ setting (depending on the target CellRanger version)

  • include_probe_set (bool) – if set then the ‘probe-set’ setting will be included in the template (defaults to False; if ‘probe_set’ is defined then will be set to True automatically)

  • probe_set (str) – path to probe set CSV file

  • cellranger_version (str) – optionally specify the target CellRanger version number (or None to use the default version)

auto_process_ngs.tenx.utils.make_qc_summary_html(json_file, html_file)

Make HTML report for cellranger mkfastqs processing stats

Parameters:
  • json_file (str) – path to JSON file output from cellranger mkfastq command

  • html_file (str) – path to output HTML file

auto_process_ngs.tenx.utils.spaceranger_info(path=None, name=None)

Retrieve information on the spaceranger software

If called without any arguments this will locate the first spaceranger executable that is available on the user’s PATH, and attempts to extract the version.

Alternatively if the path to an executable is supplied then the version will be determined from that instead.

If no version is identified then the script path is still returned, but without any version info.

If a ‘path’ is supplied then the package name will be taken from the basename; otherwise the package name can be supplied via the ‘name’ argument. If neither are supplied then the package name defaults to ‘cellranger’.

Returns:

tuple consisting of (PATH,PACKAGE,VERSION) where PATH

is the full path for the spaceranger program, PACKAGE is ‘spaceranger’, and VERSION is the package version. If any value can’t be determined then it will be returned as an empty string.

Return type:

Tuple