auto_process_ngs.qc.apps.cellranger

Provides utility classes and functions for handline Cellranger outputs.

Provides the following classes:

  • CellrangerCount: handle outputs from cellranger count

  • CellrangerMulti: handle outputs from cellranger multi

Provides the following functions:

  • cellranger_count_output: get names for cellranger count output

  • cellranger_atac_count_output: get names for cellranger-atac count output

  • cellranger_arc_count_output: get names for cellranger-arc count output

  • cellranger_multi_output: get names for cellranger multi output

  • fetch_cellranger_multi_output_dirs: get list of cellranger multi output dirs

  • extract_path_data: extract version, reference and physical sample from path

class auto_process_ngs.qc.apps.cellranger.CellrangerCount(cellranger_count_dir, cellranger_exe=None, version=None, reference_data=None)

Wrapper class for handling outputs from cellranger count

The CellrangerCount object gives access to various details of the outputs (such as sample name and file paths).

property cellranger_exe

Cellranger executable

property cmdline

Return the command line used to run ‘cellranger* count’

property cmdline_file

Path to the ‘cellranger* count’ ‘_cmdline’ file

property dir

Path to the directory with the cellranger count outputs

property metrics

Return the appropriate ‘MetricsSummary’ object

property metrics_csv

Path to the cellranger count ‘metrics.csv’ file

property pipeline_name

Pipeline name i.e. name of the software package

property reference_data

Reference dataset

property sample_name

Sample name derived from the directory name

property version

Cellranger version

property web_summary

Path to the cellranger count ‘web_summary.html’ file

class auto_process_ngs.qc.apps.cellranger.CellrangerMulti(cellranger_multi_dir, cellranger_exe=None, version=None, reference_data=None, sample=None, config_csv=None)

Wrapper class for handling outputs from cellranger multi

The CellrangerMulti object gives access to various details of the outputs (such as sample name and file paths).

Note that if the physical sample name is not supplied via the sample argument then the class will attempt to extract it from the configuration file name (assuming it’s of the form 10x_multi_config.<SAMPLE>.csv).

property cellranger_exe

Cellranger executable

property cmdline

Return the command line used to run ‘cellranger* count’

property cmdline_file

Path to the ‘cellranger multi’ ‘_cmdline’ file

property config

Return CellrangerMultiConfigCsv instance

property dir

Path to the directory with the ‘cellranger multi’ outputs

metrics(name)

Return a ‘MultiplexSummary’ object for a sample

metrics_csv(name)

Path to the cellranger multi ‘metrics.csv’ file for a sample

property physical_sample

Associated physical sample name

property pipeline_name

Pipeline name i.e. name of the software package

property probe_set

Probe set

property reference_data

Reference dataset

property sample_names

Sample names derived from the subdirectory names

property version

Cellranger version

web_summary(name)

Path to the cellranger multi ‘web_summary.html’ file for a sample

auto_process_ngs.qc.apps.cellranger.cellranger_arc_count_output(project, sample_name=None, prefix='cellranger_count')

Generate list of ‘cellranger-arc count’ outputs

Given an AnalysisProject, the outputs from ‘cellranger-arc count’ will look like:

  • {PREFIX}/{SAMPLE_n}/outs/summary.csv

  • {PREFIX}/{SAMPLE_n}/outs/web_summary.html

for each SAMPLE_n in the project.

If a sample name is supplied then outputs are limited to those for that sample

Parameters:
  • project (AnalysisProject) – project to generate output names for

  • sample_name (str) – sample to limit outputs to

  • prefix (str) – directory for outputs (defaults to “cellranger_count”)

Returns:

cellranger count outputs (without leading paths)

Return type:

tuple

auto_process_ngs.qc.apps.cellranger.cellranger_atac_count_output(project, sample_name=None, prefix='cellranger_count')

Generate list of ‘cellranger-atac count’ outputs

Given an AnalysisProject, the outputs from ‘cellranger-atac count’ will look like:

  • {PREFIX}/{SAMPLE_n}/outs/summary.csv

  • {PREFIX}/{SAMPLE_n}/outs/web_summary.html

for each SAMPLE_n in the project.

If a sample name is supplied then outputs are limited to those for that sample

Parameters:
  • project (AnalysisProject) – project to generate output names for

  • sample_name (str) – sample to limit outputs to

  • prefix (str) – directory for outputs (defaults to “cellranger_count”)

Returns:

cellranger count outputs (without leading paths)

Return type:

tuple

auto_process_ngs.qc.apps.cellranger.cellranger_count_output(project, sample_name=None, prefix='cellranger_count')

Generate list of ‘cellranger count’ outputs

Given an AnalysisProject, the outputs from ‘cellranger count’ will look like:

  • {PREFIX}/{SAMPLE_n}/outs/metrics_summary.csv

  • {PREFIX}/{SAMPLE_n}/outs/web_summary.html

for each SAMPLE_n in the project.

If a sample name is supplied then outputs are limited to those for that sample

Parameters:
  • project (AnalysisProject) – project to generate output names for

  • sample_name (str) – sample to limit outputs to

  • prefix (str) – directory for outputs (defaults to “cellranger_count”)

Returns:

cellranger count outputs (without leading paths)

Return type:

tuple

auto_process_ngs.qc.apps.cellranger.cellranger_multi_output(project, config_csv, sample_name=None, prefix='cellranger_multi')

Generate list of ‘cellranger multi’ outputs

Given an AnalysisProject, the outputs from ‘cellranger multi’ will look like:

  • {PREFIX}/outs/multi/multiplexing_analysis/tag_calls_summary.csv

and

  • {PREFIX}/outs/per_sample_outs/{SAMPLE_n}/metrics_summary.csv

  • {PREFIX}/outs/per_sample_outs/{SAMPLE_n}/web_summary.html

for each multiplexed SAMPLE_n defined in the config.csv file (nb these are not equivalent to the ‘samples’ defined by the Fastq files in the project).

If a sample name is supplied then outputs are limited to those for that sample; if the supplied config.csv file isn’t found then no outputs will be returned.

Parameters:
  • project (AnalysisProject) – project to generate output names for

  • config_csv (str) – path to the cellranger multi config.csv file

  • sample_name (str) – multiplexed sample to limit outputs to (optional)

  • prefix (str) – directory for outputs (optional, defaults to “cellranger_multi”)

Returns:

cellranger multi outputs (without leading paths)

Return type:

tuple

auto_process_ngs.qc.apps.cellranger.extract_path_data(multi_output_dir, top_dir)

Get version, refdata and sample name from output path

Attempts to extract the version, reference data and physical sample name from the intermediate directory names above a cellranger multi output directory.

For example: if cellranger multi outputs are stored under the top-level directory “cellranger_multi”, then outputs from individual runs might be arranged under this directory as:

cellranger_multi/8.0.0/refdata-cellranger-gex-GRCh38-2020-A/…

In this case the version would be “8.0.0”, the reference would be “refdata-cellranger-gex-GRCh38-2020-A”, and the physical sample name would not be available.

Alternatively if the arrangement is:

cellranger_multi/9.0.0/refdata-cellranger-gex-GRCh38-2020-A/PB1/…

then the version would be “9.0.0”, the reference would be “refdata-cellranger-gex-GRCh38-2020-A”, and the physical sample name would be “PB1”.

Parameters:
  • multi_output_dir (str) – path to the cellranger multi output directory

  • top_dir (str) – the top level directory for all cellranger multi directories

Returns:

extracted data items as a tuple of

(version, reference, sample)

Return type:

Tuple

auto_process_ngs.qc.apps.cellranger.fetch_cellranger_multi_output_dirs(top_dir)

Locate output directories from cellranger multi

Recursively searches the directory structure under the supplied top-level directory and returns a list of paths to each possible “cellranger multi” output directory.

Putative output directories will contain at minimum subdirectories called “outs” and “per_sample_outs”.

Parameters:

top_dir (str) – path to directory to search under

Returns:

list of paths to putative “cellranger multi”

output directories.

Return type:

List