auto_process_ngs.qc.modules.cellranger_multi

Implements the ‘cellranger_multi’ QC module:

  • CellrangerMulti: core QCModule class

  • GetCellrangerMultiConfig: pipeline task to acquire multi config file

  • RunCellrangerMulti: pipeline task to run ‘cellranger multi’

  • expected_outputs: helper function for handling ‘cellranger multi’ outputs

Also imports the following pipeline tasks:

  • Get10xPackage

  • DetermineRequired10xPackage

class auto_process_ngs.qc.modules.cellranger_multi.CellrangerMulti

Class for handling the ‘cellranger_multi’ QC module

classmethod add_to_pipeline(p, project_name, project, qc_dir, qc_module_name, cellranger_exe=None, cellranger_out_dir=None, cellranger_jobmode=None, cellranger_maxjobs=None, cellranger_mempercore=None, cellranger_jobinterval=None, cellranger_localcores=None, cellranger_localmem=None, cellranger_required_version=None, required_tasks=None, cellranger_runner=None, envmodules=None, working_dir=None)

Adds tasks for ‘cellranger_multi’ module to pipeline

Arguments: p (Pipeline): pipeline to extend project_name (str): name to associate with project for

reporting tasks

project (AnalysisProject): project to run 10x

cellranger pipeline within

qc_dir (str): directory for QC outputs (defaults

to subdirectory ‘qc’ of project directory)

qc_module_name (str): QC module being used cellranger_exe (str): optional, explicitly specify

the cellranger executable to use (default: cellranger executable is determined automatically)

cellranger_jobmode (str): specify the job mode to

pass to cellranger (default: “local”)

cellranger_maxjobs (int): specify the maximum

number of jobs to pass to cellranger (default: None)

cellranger_mempercore (int): specify the memory

per core (in Gb) to pass to cellranger (default: None)

cellranger_jobinterval (int): specify the interval

between launching jobs (in ms) to pass to cellranger (default: None)

cellranger_localcores (int): maximum number of cores

cellranger can request in jobmode ‘local’ (default: None)

cellranger_localmem (int): maximum memory cellranger

can request in jobmode ‘local’ (default: None)

required_tasks (list): list of tasks that the

cellranger pipeline should wait for

cellranger_runner (JobRunner): runner to use for

running ‘cellranger multi’

envmodules (list): environment module names to

load for running Cellranger

working_dir (str): explicitly specify path to working

directory

classmethod collect_qc_outputs(qc_dir)

Collect information on Cellranger multi outputs

Returns an AttributeDictionary with the following attributes:

  • name: set to ‘cellranger_multi’

  • software: dictionary of software and versions

  • references: list of associated reference datasets

  • probe_sets: list of associated probe sets

  • fastqs: list of associated Fastq names

  • multiplexed_samples: list of associated multiplexed sample names

  • pipelines: list of tuples defining 10x pipelines in the form (name,version,reference)

  • samples_by_pipeline: dictionary with lists of multiplexed sample names associated with each 10x pipeline tuple

  • config_files: list of associated config files (‘10x_multi_config[.<SAMPLE>].csv’)

  • output_files: list of associated output files

  • tags: list of associated output classes

Parameters:

qc_dir (QCDir) – QC directory to examine

classmethod verify(params, qc_outputs)

Verify ‘cellranger_multi’ QC module against outputs

Returns one of 3 values:

  • True: outputs verified ok

  • False: outputs failed to verify

  • None: verification not possible

Parameters:
  • params (AttributeDictionary) – values of parameters used as inputs

  • qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method

class auto_process_ngs.qc.modules.cellranger_multi.GetCellrangerMultiConfigs(_name, *args, **kws)

Locate ‘config.csv’ files for cellranger multi

init(project, qc_dir)

Initialise the GetCellrangerMultiConfig task.

Parameters:
  • project (AnalysisProject) – project to run QC for

  • qc_dir (str) – top-level QC directory to put ‘config.csv’ files

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.qc.modules.cellranger_multi.RunCellrangerMulti(_name, *args, **kws)

Run ‘cellranger multi’

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(project, config_csvs, samples, reference_data_path, probe_set_path, out_dir, qc_dir=None, cellranger_exe=None, cellranger_version=None, cellranger_jobmode='local', cellranger_maxjobs=None, cellranger_mempercore=None, cellranger_jobinterval=None, cellranger_localcores=None, cellranger_localmem=None, cellranger_required_version=None, working_dir=None)

Initialise the RunCellrangerMulti task.

Parameters:
  • project (AnalysisProject) – project to run QC for

  • config_csvs (list) – list of paths to ‘cellranger multi’ configuration files

  • samples (list) – list of sample names from the config.csv file

  • reference_data_path (str) – path to the cellranger compatible reference dataset from the config.csv file

  • probe_set_path (str) – path to the probe set reference dataset from the config.csv file

  • out_dir (str) – top-level directory to copy all final ‘multi’ outputs into. Outputs won’t be copied if no value is supplied

  • qc_dir (str) – top-level QC directory to put ‘count’ QC outputs (e.g. metrics CSV and summary HTML files) into. Outputs won’t be copied if no value is supplied

  • cellranger_exe (str) – the path to the Cellranger software package to use (e.g. ‘cellranger’, ‘cellranger-atac’, ‘spaceranger’)

  • cellranger_version (str) – the version string for the Cellranger package

  • cellranger_jobmode (str) – specify the job mode to pass to cellranger (default: “local”)

  • cellranger_maxjobs (int) – specify the maximum number of jobs to pass to cellranger (default: None)

  • cellranger_mempercore (int) – specify the memory per core (in Gb) to pass to cellranger (default: None)

  • cellranger_jobinterval (int) – specify the interval between launching jobs (in ms) to pass to cellranger (default: None)

  • cellranger_localcores (int) – maximum number of cores cellranger can request in jobmode ‘local’ (defaults to number of slots set in runner)

  • cellranger_localmem (int) – maximum memory cellranger can request in jobmode ‘local’ (default: None)

  • cellranger_required_version (str) – string specifying the required Cellranger version (default: None)

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

auto_process_ngs.qc.modules.cellranger_multi.expected_outputs(config_csv, multi_id=None, prefix=None)

Generate expected output file paths from 10x multi config

Parameters:
  • config_csv (str) – path to the 10x multi config file to generate the output file names for

  • multi_id (str) – optional, the ID of the multi run (supplied via the –id argument), used if the config file doesn’t define multiplexed samples

  • prefix (str) – optional path to prepend to the expected file paths

Returns:

list of paths to expected output files.

Return type:

List