auto_process_ngs.tenx.cellplex

Utilities for working with 10x Genomics single cell multiplexing (CellPlex) pipelines:

  • CellrangerMultiConfigCsv

class auto_process_ngs.tenx.cellplex.CellrangerMultiConfigCsv(filen, strict=True)

Class to handle cellranger multi ‘config.csv’ files

See https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/multi#cellranger-multi

Provides the following properties:

  • sample_names: list of multiplexed sample names

  • sections: list of the sections in the config

  • reference_data_path: path to the reference dataset

  • probe_set_path: path to the probe set

  • feature_reference_path: path to the feature reference

  • vdj_reference_path: path to the V(D)J-compatible reference

  • gex_libraries: list of Fastq IDs associated with GEX data

  • physical_sample: physical sample name extracted from the config file name if present (otherwise None)

  • is_valid: indicates whether the file appears to be valid

Provides the following methods:

  • sample: returns information on a specific multiplexed sample

  • gex_library: returns information on a specific GEX library

  • fastq_dirs: returns mapping of library names to the associated Fastq directory paths

  • pretty_print_samples: returns a string with a ‘nice’ description of the multiplexed sample names

  • get_errors: returns a list of error messages (if any) indicating problems with the config.csv file

By default data from the config.csv file is read in ‘strict’ mode; any errors detected in formatting will cause an exception to be raised. If the file is read with ‘strict’ turned off then the ‘is_valid’ property can be used to check if the file is corrected formatted, and any errors can be accessed via the ‘get_errors’ method.

property fastq_dirs

Return mapping of library names to Fastq directories

property feature_reference_path

Return the path to the feature reference file from config.csv

property feature_types

Return list of feature types defined in config file

Feature type names are returned converted to lower case.

get_errors()

Return errors detected on reading in the config.csv file

Returns:

list of error messages that were encountered;

will be empty if there were no errors.

Return type:

List

property gex_libraries

Return the library names associated with GEX data from config.csv

Libraries are listed in the ‘[libraries]’ section

gex_library(name)

Return dictionary of values associated with GEX library

Parameters:

name (str) – name of the sample of interest

property is_valid

Indicate whether config.csv file is valid

Returns:

True if no errors were encountered reading in

the file, False if not.

Return type:

Boolean

libraries(feature_type)

Return library names associated with specified feature type

library(feature_type, name)

Return dictionary of values associated with library

Keys include:

  • ‘fastqs’ (path to Fastqs)

  • ‘lanes’ (associated lanes)

  • ‘library_id’ (physical library ID)

  • ‘feature_type’ (e.g. ‘Gene Expression’)

  • ‘subsample_rate’ (the associated subsampling rate)

Parameters:
  • feature_type (str) – feature type of the library of interest (e.g. ‘Gene Expression’)

  • name (str) – name of the library of interest

property physical_sample

Return the physical sample from config.csv name

Physical sample name is extracted from config file names of the form:

10x_multi_config[.SAMPLE].csv

If no physical sample name is present in the name then returns ‘None’.

pretty_print_samples()

Return string describing the multiplexed sample names

Wraps a call to ‘pretty_print_names’ function.

Returns:

pretty description of multiplexed sample names.

Return type:

String

property probe_set_path

Return the path to the probe set file from config.csv

property reference_data_path

Return the path to the reference dataset from config.csv

sample(sample_name)

Return dictionary of values associated with multiplexed sample

Keys include ‘cmo’ (list of CMO ids) and ‘description’ (description text) associated with the sample in the ‘[samples]’ section of the config.csv file.

Parameters:

sample_name (str) – name of the sample of interest

property sample_names

Return the multiplexed sample names from config.csv

Samples are listed in the ‘[samples]’ section.

property sections

Return the list of sections in the config.csv file

property vdj_reference_path

Return the path to the V(D)J reference file from config.csv