auto_process_ngs.tenx.cellplex
Utilities for working with 10x Genomics single cell multiplexing (CellPlex) pipelines:
CellrangerMultiConfigCsv
- class auto_process_ngs.tenx.cellplex.CellrangerMultiConfigCsv(filen, strict=True)
Class to handle cellranger multi ‘config.csv’ files
Provides the following properties:
sample_names: list of multiplexed sample names
sections: list of the sections in the config
reference_data_path: path to the reference dataset
probe_set_path: path to the probe set
feature_reference_path: path to the feature reference
vdj_reference_path: path to the V(D)J-compatible reference
gex_libraries: list of Fastq IDs associated with GEX data
physical_sample: physical sample name extracted from the config file name if present (otherwise None)
is_valid: indicates whether the file appears to be valid
Provides the following methods:
sample: returns information on a specific multiplexed sample
gex_library: returns information on a specific GEX library
fastq_dirs: returns mapping of library names to the associated Fastq directory paths
pretty_print_samples: returns a string with a ‘nice’ description of the multiplexed sample names
get_errors: returns a list of error messages (if any) indicating problems with the config.csv file
By default data from the config.csv file is read in ‘strict’ mode; any errors detected in formatting will cause an exception to be raised. If the file is read with ‘strict’ turned off then the ‘is_valid’ property can be used to check if the file is corrected formatted, and any errors can be accessed via the ‘get_errors’ method.
- property fastq_dirs
Return mapping of library names to Fastq directories
- property feature_reference_path
Return the path to the feature reference file from config.csv
- property feature_types
Return list of feature types defined in config file
Feature type names are returned converted to lower case.
- get_errors()
Return errors detected on reading in the config.csv file
- Returns:
- list of error messages that were encountered;
will be empty if there were no errors.
- Return type:
- property gex_libraries
Return the library names associated with GEX data from config.csv
Libraries are listed in the ‘[libraries]’ section
- gex_library(name)
Return dictionary of values associated with GEX library
- Parameters:
name (str) – name of the sample of interest
- property is_valid
Indicate whether config.csv file is valid
- Returns:
- True if no errors were encountered reading in
the file, False if not.
- Return type:
Boolean
- libraries(feature_type)
Return library names associated with specified feature type
- library(feature_type, name)
Return dictionary of values associated with library
Keys include:
‘fastqs’ (path to Fastqs)
‘lanes’ (associated lanes)
‘library_id’ (physical library ID)
‘feature_type’ (e.g. ‘Gene Expression’)
‘subsample_rate’ (the associated subsampling rate)
- Parameters:
feature_type (str) – feature type of the library of interest (e.g. ‘Gene Expression’)
name (str) – name of the library of interest
- property physical_sample
Return the physical sample from config.csv name
Physical sample name is extracted from config file names of the form:
10x_multi_config[.SAMPLE].csv
If no physical sample name is present in the name then returns ‘None’.
- pretty_print_samples()
Return string describing the multiplexed sample names
Wraps a call to ‘pretty_print_names’ function.
- Returns:
pretty description of multiplexed sample names.
- Return type:
String
- property probe_set_path
Return the path to the probe set file from config.csv
- property reference_data_path
Return the path to the reference dataset from config.csv
- sample(sample_name)
Return dictionary of values associated with multiplexed sample
Keys include ‘cmo’ (list of CMO ids) and ‘description’ (description text) associated with the sample in the ‘[samples]’ section of the config.csv file.
- Parameters:
sample_name (str) – name of the sample of interest
- property sample_names
Return the multiplexed sample names from config.csv
Samples are listed in the ‘[samples]’ section.
- property sections
Return the list of sections in the config.csv file
- property vdj_reference_path
Return the path to the V(D)J reference file from config.csv