Processing 10x Genomics single cell data

Background

10xGenomics provides a range of platforms for various types of single cell samples including:

Single cell and single nuclei RNA-seq
Single cell ATAC-seq
Single cell multiome gene expression and ATAC data
Single cell immune profiling data
Cellplex (cell multiplexing)
Flex (fixed RNA profiling)

The auto_process pipeline has integrated support for Fastq generation and QC (in the make_fastqs and run_qc commands) for these data as outlined in the following sections.

Requirements

The appropriate 10x Genomics software pipeline must already be installed and available to auto_process.py:

10x Genomics data	Software required
Single cell/single nuclei RNA-seq	CellRanger
Single cell ATAC-seq	CellRanger-ATAC
Single cell multiome	CellRanger-ARC
Single cell immune profiling	Cellranger
CellPlex (cell multiplexing)	CellRanger
Flex (fixed RNA profiling)	CellRanger

In addition:

Fastq generation for all data requires bcl2fastq;
QC for single cell multiome GEX also uses CellRanger;
QC for single cell multiome ATAC also uses Cellranger-ATAC.

Fastq generation

General

Fastq generation for 10x Genomics single cell data should be performed using the make_fastqs command with the appropriate 10x_* protocol specified via the --protocol option:

Single cell platform	Protocol
10x Genomics Chromium single cell ATAC-seq data	`10x_atac`
10x Genomics Chromium 3’ and 5’ single cell GEX data	`10x_chromium_sc`
10x Genomics single cell multiome ATAC-seq data (pooled with GEX data)	`10x_multiome_atac`
10x Genomics single cell multiome GEX data (pooled with ATAC data)	`10x_multiome_gex`

Note

By default adapter trimming is automatically disabled for all 10x_* protocols, by removing any adapter sequences specified in the sample sheet.

Sample sheets for 10x Genomics data can specify either Illumina index sequences (recommended) or 10x Genomics-style index IDs (now deprecated by 10x Genomics); generally the 10x_* protocols can operate with either.

Note

The 10x_multiome protocol (which can be used for “unpooled” 10x single cell multiome data) still requires 10x Genomics indexes. However it is recommended that even for unpooled data the specific protocol is used (i.e. either 10x_multiome_gex or 10x_multiome_atac) as outlined in the following section.

Choosing Fastq generation protocol for single cell multiome data

There are three Fastq generation protocols for single cell multiome data; which should be used will depend on the specific configuration of the sequencing run:

Run only has either the GEX or the ATAC component of the single cell multiome experiment (unpooled): use the appropriate 10x_multiome_* protocol (i.e. 10x_multiome_gex or 10x_multiome_atac) according to the type of data in the run.

For example:
auto_process.py make_fastqs --protocol 10x_multiome_gex
Run has both GEX and ATAC components of the single cell multiome experiment in different lanes (pooled): use the --lanes option to assign the appropriate Fastq generation protocol to each of the lane subsets.

For example:
auto_process.py make_fastqs \
--lanes=1:10x_multiome_atac \
--lanes=2:10x_multiome_gex

Note

While the 10x_multiome protocol is still supported for unpooled data with 10x Genomics index IDs, it is now deprecated and should not be used.

Analysis project setup and QC

Once Fastqs have been successfully generated, the SC_platform and Library metadata items should be set to the appropriate values for the 10x Genomics single cell project(s) in the projects.info control file.

The following values are valid options for 10x Genomics single cell data within auto-process-ngs:

Assays	Platform	Library type	Extensions
`10x Chromium 3' (v4 GEM-X) sc GEX`	`10x Chromium 3'`	`scRNA-seq`	`CSP`, `CRISPR`, `Antibody Capture`, `Feature Barcode`
`10x Chromium 3' (v4 GEM-X) OCM sc GEX`, `10x Chromium 3' (v4 GEM-X) OCM sn GEX`	`10x Chromium 3' OCM`	`scRNA-seq`	`CSP`, `CRISPR`, `Antibody Capture`, `Feature Barcode`
	`10x Chromium 3' OCM`	`snRNA-seq`	`CSP`, `CRISPR`, `Antibody Capture`, `Feature Barcode`
`10x Chromium 3' (v4 GEM-X) sn GEX`	`10x Chromium 3'`	`snRNA-seq`
`10x Chromium 5' (v3 GEM-X) sc GEX`, `10x Chromium 5' (v3 GEM-X) sn GEX`	`10x Chromium 5'`	`Immune Profiling`	`CSP`, `TCR`, `BCR`, `VDJ`, `CRISPR`, `BEAM`, `Antibody Capture`, `Feature Barcode`, `VDJ-T`, `VDJ-B`
`10x Chromium 5' (v3 GEM-X) OCM sc GEX`, `10x Chromium 5' (v3 GEM-X) OCM sn GEX`	`10x Chromium 5' OCM`	`Immune Profiling`	`CSP`, `TCR`, `BCR`, `VDJ`, `CRISPR`, `BEAM`, `Antibody Capture`, `Feature Barcode`, `VDJ-T`, `VDJ-B`
`10x Chromium Flex (v1 GEM-X) sc GEX`, `10x Chromium Flex (v1 GEM-X) sn GEX`	`10x Chromium Flex`	`scRNA Profiling`	`PEX`
	`10x Chromium Flex`	`snRNA Profiling`	`PEX`
`10x Chromium Flex Apex (v2 GEM-X) sc GEX`, `10x Chromium Flex Apex (v2 GEM-X) sn GEX`	`10x Chromium Flex Apex`	`scRNA Profiling`	`CSP`, `Antibody Capture`, `Feature Barcode`
	`10x Chromium Flex Apex`	`snRNA Profiling`	`CSP`, `Antibody Capture`, `Feature Barcode`
`10x Chromium 3' (v3.1 Next GEM ST) CellPlex sc GEX`, `10x Chromium 3' (v3.1 Next GEM HT) CellPlex sc GEX`, `10x Chromium 3' (v3.1 Next GEM ST) CellPlex sn GEX`, `10x Chromium 3' (v3.1 Next GEM HT) CellPlex sn GEX`	`10x Chromium 3' CellPlex`	`scRNA-seq`
	`10x Chromium 3' CellPlex`	`snRNA-seq`
`10x Chromium Epi ATAC (v2) sn ATAC`	`10x Single Cell ATAC`	`snATAC-seq`
`10x Chromium Epi Multiome ATAC (v1) sn ATAC`	`10x Single Cell Multiome`	`ATAC`
`10x Chromium Epi Multiome ATAC (v1) sn GEX`	`10x Single Cell Multiome`	`GEX`

Library types and extensions

Extensions are optional additions to the base library type which give additional information about the type of experiment that was performed.

One or more extensions can be specified as part of the library type, by appending them using a plus symbol (+). For example:

scRNA-seq+CSP indicates single cell RNA-seq (gene expression) performed with cell surface proteins;
Immune Profiling+CSP+VDJ indicates single cell immune profiling with both cell surface proteins and V(D)J.

Running the setup_analysis_dirs command will automatically transfer these values into the single cell project metadata on creation.

Additionally for certain types of data, setup_analysis_dirs will also create template control files for use in subsequent QC runs:

Single cell multiome: a template 10x_multiome_libraries.info file, which should be renamed and populated in order to link each ATAC (or GEX) sample to the complementary GEX (or ATAC) sample.

CellPlex and Flex: a template 10x_multi_config.csv file, which should be copied, renamed and appropriately edited for each physical sample in the dataset.

Single Cell immune profiling: a template 10x_multi_config.csv file, which should be copied, renamed and appropriately edited for each physical sample in the dataset.

For the 10x_multi_config.csv the extensions will determine the form of the template (so that only the necessary elements are included).

The run_qc command will then determine the appropriate QC protocol to use based on the metadata values.

Troubleshooting

Single-library analyses fail for low read counts

It has been observed that when the Fastq files produced by the mkfastq command have very low read counts then the single-library analyses may fail, with cellranger count reporting an error of the form e.g.:

Could not auto-detect Single Cell 3' chemistry. Fraction of barcodes
on whitelist was at best 0.23%, while we expected at least 10.00% for
one of the chemistries.

There is currently no workaround for this issue.

Single-library analyses fail to detect chemistry automatically

By default cellranger count attempts to determine the chemistry used automatically, however this may fail if a low number of reads map to the reference genome and give an error of the form:

The chemistry was unable to be automatically determined. This can
happen if not enough reads originate from the given reference. Please
verify your choice of reference or explicitly specify the chemistry
via the --chemistry argument.

If the reference data being used is correct then use the --chemistry option to specify the appropriate assay configuration - see https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count

Appendices

Manual QC steps for single cell immune profiling data

Currently a full automated QC protocol is not available for Chromium 5’ single cell immune profiling: specifically, there is no provision for running Cellranger’s multi pipeline for each sample, or for automatically integrating the resulting outputs into the QC report.

It is possible to run the multi pipeline manually for each sample, using the sample-specific 10x_multi_config.<SAMPLE>.csv files.

For example, a script of the form:

#!/usr/bin/bash
#$ -N cellranger_multi_PJB01
#$ -V
#$ -cwd
#$ -j y
#$ -pe smp.pe 16
#$ -l mem256
mkdir -p cellranger_multi && cd cellranger_multi
/PATH/TO/cellranger multi \
--id PJB01 --csv PATH/TO/10x_multi_config.PJB01.csv \
--jobmode=local \
--localcores=16 \
--localmem=128 \
--maxjobs=24 \
--jobinterval=100

could be used to submit a Cellranger multi job for the PJB01 sample, with the outputs being created in a subdirectory cellranger_multi/PJB01 in the current directory.

To include the outputs in the QC report, copy the relevant files (specifically the web_summary.html files for each sample) into the QC directory and then create an extra_outputs.tsv which references these (as described in Including external (non-pipeline) outputs).

For example:

cellranger_multi/PJB01/web_summary.html    CellRanger multi output for PJB01

Rerunning run_qc will force update of the QC report which should then also link in these additional reports.