auto_process_ngs.tenx.utils

Utility functions for processing the outputs from 10x Genomics pipelines:

  • flow_cell_id

  • has_10x_indices

  • has_chromium_sc_indices

  • get_bases_mask_10x_atac

  • get_bases_mask_10x_multiome

  • cellranger_info

  • spaceranger_info

  • make_qc_summary_html

  • add_cellranger_args

  • make_multi_config_template

auto_process_ngs.tenx.utils.add_cellranger_args(cellranger_cmd, jobmode=None, maxjobs=None, mempercore=None, jobinterval=None, localcores=None, localmem=None, disable_ui=False)

Configure options for cellranger

Given a Command instance for running cellranger, add the appropriate options (e.g. –jobmode) according to the supplied arguments.

Parameters:
  • cellranger_cmd (Command) – Command instance for running cellranger

  • jobmode (str) – if specified, will be passed to the –jobmode option

  • maxjobs (int) – if specified, will be passed to the –mempercore option

  • mempercore (int) – if specified, will be passed to the –maxjobs option (only if jobmode is not “local”)

  • jobinterval (int) – if specified, will be passed to the –jobinterval option

  • localcores (int) – if specified, will be passed to the –localcores option (only if jobmode is “local”)

  • localmem (int) – if specified, will be passed to the the –localmem option (only if jobmode is “local”)

  • disable_ui (bool) – if True, add the –disable-ui option (default is not to add it)

Returns:

the original command updated with the

appropriate options.

Return type:

Command

auto_process_ngs.tenx.utils.cellranger_info(path=None, name=None)

Retrieve information on the cellranger software

If called without any arguments this will locate the first cellranger executable that is available on the user’s PATH, and attempts to extract the version.

Alternatively if the path to an executable is supplied then the version will be determined from that instead.

If no version is identified then the script path is still returned, but without any version info.

If a ‘path’ is supplied then the package name will be taken from the basename; otherwise the package name can be supplied via the ‘name’ argument. If neither are supplied then the package name defaults to ‘cellranger’.

Returns:

tuple consisting of (PATH,PACKAGE,VERSION) where PATH

is the full path for the cellranger program, PACKAGE is ‘cellranger’, and VERSION is the package version. If any value can’t be determined then it will be returned as an empty string.

Return type:

Tuple

auto_process_ngs.tenx.utils.flow_cell_id(run_name)

Extract the flow cell ID from a run name

For example for run name “170426_K00311_0033_AHJCY7BBXX” the extracted flow cell ID will be “HJCY7BBXX”.

Parameters:

run_name (str) – path to the run name to extract flow cell ID from

Returns:

the extracted flow cell ID.

Return type:

String

auto_process_ngs.tenx.utils.get_bases_mask_10x_atac(runinfo_xml)

Acquire a bases mask for 10xGenomics scATAC-seq

Generates an initial bases mask based on the run contents, and then updates this so that:

  1. Only the first 8 bases of the first index read are actually used, and

  2. The second index read is converted to a data read.

For example: if the initial bases mask is ‘Y50,I16,I16,Y50’ then the scATAC-seq bases mask will be ‘Y50,I8nnnnnnnn,Y16,Y50’.

Parameters:

runinfo_xml (str) – path to the RunInfo.xml for the sequencing run

Returns:

10xGenomics scATAC-seq bases mask string

Return type:

String

auto_process_ngs.tenx.utils.get_bases_mask_10x_multiome(runinfo_xml, library)

Return bases mask for 10xGenomics single cell multiome

Generates an initial bases mask based on the run contents, and then updates this based on the library type (either ‘atac’ or ‘gex’).

For ATAC data: the template bases mask is “Y*,I8n*,Y24,Y*” (keeping all of read 1, first 8 bases of read 2, all 24 bases of read 3, and all of read 4).

For example: if the initial bases mask is ‘Y50,I10,Y24,Y90’ then the single cell multiome ATAC bases mask will be ‘Y50,I8n2,Y24,Y90’.

For GEX data: the template bases mask is “Y28n*,I10,I10n*,Y*” (keeping first 28 bases of read 1, all 10 bases of read 2, first 10 bases of read 3, and all of read 4).

For example: if the initial bases mask is ‘Y50,I10,Y24,Y90’ then the single cell multiome GEX bases mask will be ‘Y28n22,I10,I10n14,Y90’.

Parameters:
  • runinfo_xml (str) – path to the RunInfo.xml for the sequencing run

  • library (str) – library type to set bases mask for (either ‘atac’ or ‘gex’)

Returns:

10xGenomics single cell multiome bases mask

string for the specified library type.

Return type:

String

auto_process_ngs.tenx.utils.has_10x_indices(sample_sheet)

Check if a sample sheet contains 10xGenomics-format indices

The Chromium SC 3’v2 indices are of the form:

SI-GA-[A-H][1-12]

e.g. ‘SI-GA-B11’ (see https://support.10xgenomics.com/permalink/27rGqWvNYYuqkgeS66sksm)

For scATAC-seq the indices are assumed to be of the form:

SI-NA-[A-H][1-12]

e.g. ‘SI-NA-G9’

For Visium data the indices are assumed to be of the form:

SI-(TT|TS)-[A-H][1-12]

e.g. ‘SI-TT-B1’

Parameters:

sample_sheet (str) – path to the sample sheet CSV file to check

Returns:

True if the sample sheet contains at least

one 10xGenomics-style index, False if not.

Return type:

Boolean

auto_process_ngs.tenx.utils.has_chromium_sc_indices(sample_sheet)

Wrapper for ‘has_10x_indices’.

Maintained for backwards compatibility

auto_process_ngs.tenx.utils.make_multi_config_template(f, reference=None, probe_set=None, fastq_dir=None, samples=None, no_bam=None, library_type='CellPlex', cellranger_version=None)

Write a template configuration file for ‘cellranger multi’

Generates a template for the ‘cellranger multi’ configuration file, which can be used with either CellPlex or fixed RNA profiling (Flex) data.

The format and parameters for different data types are described in the 10x Genomics ‘cellranger’ documentation:

Parameters:
  • f (str) – path that output template file will be written to

  • reference (str) – path to reference transcriptome

  • probe_set (str) – path to probe set CSV file

  • fastq_dir (str) – path to directory with Fastq files

  • samples (list) – list of sample names

  • no_bam (bool) – if set then will be the value of the ‘no-bam’ setting

  • library_type (str) – specify the library type of data that the configuration file will be used with; should be one of ‘CellPlex[…]’ (the default), ‘Flex’ or ‘Single Cell Immune Profiling’

  • cellranger_version (str) – optionally specify the target Cellranger version number (or None)

auto_process_ngs.tenx.utils.make_qc_summary_html(json_file, html_file)

Make HTML report for cellranger mkfastqs processing stats

Parameters:
  • json_file (str) – path to JSON file output from cellranger mkfastq command

  • html_file (str) – path to output HTML file

auto_process_ngs.tenx.utils.spaceranger_info(path=None, name=None)

Retrieve information on the spaceranger software

If called without any arguments this will locate the first spaceranger executable that is available on the user’s PATH, and attempts to extract the version.

Alternatively if the path to an executable is supplied then the version will be determined from that instead.

If no version is identified then the script path is still returned, but without any version info.

If a ‘path’ is supplied then the package name will be taken from the basename; otherwise the package name can be supplied via the ‘name’ argument. If neither are supplied then the package name defaults to ‘cellranger’.

Returns:

tuple consisting of (PATH,PACKAGE,VERSION) where PATH

is the full path for the spaceranger program, PACKAGE is ‘spaceranger’, and VERSION is the package version. If any value can’t be determined then it will be returned as an empty string.

Return type:

Tuple