auto_process_ngs.qc.apps.cellranger
Provides utility classes and functions for handline Cellranger outputs.
Provides the following classes:
CellrangerCount: handle outputs from cellranger count
CellrangerMulti: handle outputs from cellranger multi
Provides the following functions:
cellranger_count_output: get names for cellranger count output
cellranger_atac_count_output: get names for cellranger-atac count output
cellranger_arc_count_output: get names for cellranger-arc count output
cellranger_multi_output: get names for cellranger multi output
fetch_cellranger_multi_output_dirs: get list of cellranger multi output dirs
extract_path_data: extract version, reference and physical sample from path
- class auto_process_ngs.qc.apps.cellranger.CellrangerCount(cellranger_count_dir, cellranger_exe=None, version=None, reference_data=None)
Wrapper class for handling outputs from cellranger count
The
CellrangerCountobject gives access to various details of the outputs (such as sample name and file paths).- property cellranger_exe
Cellranger executable
- property cmdline
Return the command line used to run ‘cellranger* count’
- property cmdline_file
Path to the ‘cellranger* count’ ‘_cmdline’ file
- property dir
Path to the directory with the cellranger count outputs
- property metrics
Return the appropriate ‘MetricsSummary’ object
- property metrics_csv
Path to the cellranger count ‘metrics.csv’ file
- property pipeline_name
Pipeline name i.e. name of the software package
- property reference_data
Reference dataset
- property sample_name
Sample name derived from the directory name
- property version
Cellranger version
- property web_summary
Path to the cellranger count ‘web_summary.html’ file
- class auto_process_ngs.qc.apps.cellranger.CellrangerMulti(cellranger_multi_dir, cellranger_exe=None, version=None, reference_data=None, sample=None, config_csv=None)
Wrapper class for handling outputs from cellranger multi
The
CellrangerMultiobject gives access to various details of the outputs (such as sample name and file paths).Note that if the physical sample name is not supplied via the
sampleargument then the class will attempt to extract it from the configuration file name (assuming it’s of the form10x_multi_config.<SAMPLE>.csv).- property cellranger_exe
Cellranger executable
- property cmdline
Return the command line used to run ‘cellranger* count’
- property cmdline_file
Path to the ‘cellranger multi’ ‘_cmdline’ file
- property config
Return CellrangerMultiConfigCsv instance
- property dir
Path to the directory with the ‘cellranger multi’ outputs
- metrics(name)
Return a ‘MultiplexSummary’ object for a sample
- metrics_csv(name)
Path to the cellranger multi ‘metrics.csv’ file for a sample
- property physical_sample
Associated physical sample name
- property pipeline_name
Pipeline name i.e. name of the software package
- property probe_set
Probe set
- property reference_data
Reference dataset
- property sample_names
Sample names derived from the subdirectory names
- property version
Cellranger version
- web_summary(name)
Path to the cellranger multi ‘web_summary.html’ file for a sample
- auto_process_ngs.qc.apps.cellranger.cellranger_arc_count_output(project, sample_name=None, prefix='cellranger_count')
Generate list of ‘cellranger-arc count’ outputs
Given an AnalysisProject, the outputs from ‘cellranger-arc count’ will look like:
{PREFIX}/{SAMPLE_n}/outs/summary.csv
{PREFIX}/{SAMPLE_n}/outs/web_summary.html
for each SAMPLE_n in the project.
If a sample name is supplied then outputs are limited to those for that sample
- Parameters:
project (AnalysisProject) – project to generate output names for
sample_name (str) – sample to limit outputs to
prefix (str) – directory for outputs (defaults to “cellranger_count”)
- Returns:
cellranger count outputs (without leading paths)
- Return type:
tuple
- auto_process_ngs.qc.apps.cellranger.cellranger_atac_count_output(project, sample_name=None, prefix='cellranger_count')
Generate list of ‘cellranger-atac count’ outputs
Given an AnalysisProject, the outputs from ‘cellranger-atac count’ will look like:
{PREFIX}/{SAMPLE_n}/outs/summary.csv
{PREFIX}/{SAMPLE_n}/outs/web_summary.html
for each SAMPLE_n in the project.
If a sample name is supplied then outputs are limited to those for that sample
- Parameters:
project (AnalysisProject) – project to generate output names for
sample_name (str) – sample to limit outputs to
prefix (str) – directory for outputs (defaults to “cellranger_count”)
- Returns:
cellranger count outputs (without leading paths)
- Return type:
tuple
- auto_process_ngs.qc.apps.cellranger.cellranger_count_output(project, sample_name=None, prefix='cellranger_count')
Generate list of ‘cellranger count’ outputs
Given an AnalysisProject, the outputs from ‘cellranger count’ will look like:
{PREFIX}/{SAMPLE_n}/outs/metrics_summary.csv
{PREFIX}/{SAMPLE_n}/outs/web_summary.html
for each SAMPLE_n in the project.
If a sample name is supplied then outputs are limited to those for that sample
- Parameters:
project (AnalysisProject) – project to generate output names for
sample_name (str) – sample to limit outputs to
prefix (str) – directory for outputs (defaults to “cellranger_count”)
- Returns:
cellranger count outputs (without leading paths)
- Return type:
tuple
- auto_process_ngs.qc.apps.cellranger.cellranger_multi_output(project, config_csv, sample_name=None, prefix='cellranger_multi')
Generate list of ‘cellranger multi’ outputs
Given an AnalysisProject, the outputs from ‘cellranger multi’ will look like:
{PREFIX}/outs/multi/multiplexing_analysis/tag_calls_summary.csv
and
{PREFIX}/outs/per_sample_outs/{SAMPLE_n}/metrics_summary.csv
{PREFIX}/outs/per_sample_outs/{SAMPLE_n}/web_summary.html
for each multiplexed SAMPLE_n defined in the config.csv file (nb these are not equivalent to the ‘samples’ defined by the Fastq files in the project).
If a sample name is supplied then outputs are limited to those for that sample; if the supplied config.csv file isn’t found then no outputs will be returned.
- Parameters:
project (AnalysisProject) – project to generate output names for
config_csv (str) – path to the cellranger multi config.csv file
sample_name (str) – multiplexed sample to limit outputs to (optional)
prefix (str) – directory for outputs (optional, defaults to “cellranger_multi”)
- Returns:
cellranger multi outputs (without leading paths)
- Return type:
tuple
- auto_process_ngs.qc.apps.cellranger.extract_path_data(multi_output_dir, top_dir)
Get version, refdata and sample name from output path
Attempts to extract the version, reference data and physical sample name from the intermediate directory names above a cellranger multi output directory.
For example: if cellranger multi outputs are stored under the top-level directory “cellranger_multi”, then outputs from individual runs might be arranged under this directory as:
cellranger_multi/8.0.0/refdata-cellranger-gex-GRCh38-2020-A/…
In this case the version would be “8.0.0”, the reference would be “refdata-cellranger-gex-GRCh38-2020-A”, and the physical sample name would not be available.
Alternatively if the arrangement is:
cellranger_multi/9.0.0/refdata-cellranger-gex-GRCh38-2020-A/PB1/…
then the version would be “9.0.0”, the reference would be “refdata-cellranger-gex-GRCh38-2020-A”, and the physical sample name would be “PB1”.
- Parameters:
multi_output_dir (str) – path to the cellranger multi output directory
top_dir (str) – the top level directory for all cellranger multi directories
- Returns:
- extracted data items as a tuple of
(version, reference, sample)
- Return type:
Tuple
- auto_process_ngs.qc.apps.cellranger.fetch_cellranger_multi_output_dirs(top_dir)
Locate output directories from cellranger multi
Recursively searches the directory structure under the supplied top-level directory and returns a list of paths to each possible “cellranger multi” output directory.
Putative output directories will contain at minimum subdirectories called “outs” and “per_sample_outs”.
- Parameters:
top_dir (str) – path to directory to search under
- Returns:
- list of paths to putative “cellranger multi”
output directories.
- Return type: