Reporting analysis directory contents using auto_process report

Generates reports about the contents of an analysis directory.

General invocation of the command is:

auto_process.py report *reporting_option* [ANALYSIS_DIR]

where reporting_option determines the data that are reported and the format that is used.

The following options are available:

Reporting option

Description

--logging

Single line summary of all projects in the analysis directory

--projects

One tab-delimited line per project

--summary

Longer format report suitable for bioinformaticians

These options are described in more detail below.

--logging report

Produces a single line summary of the projects in the analysis directory.

For example for a run with a single project:

Paired end: 'AB': Abby Brown, Mouse RNA-seq (PI: Carl Dover) (4 samples)

For runs with multiple projects, the details of the additional projects are appended with semi-colons as separators.

--projects report

Outputs a tab-delimited line for each project in the analysis directory, which could be inserted into a spreadsheet.

For example for a run with a single project:

MISEQ_150729#88  88   GTCF       Abby Brown   Carl Dover  RNA-seq   Mouse   MISEQ   4    yes     AB1-4

--summary report

Outputs a longer format report which summarises all the data and projects in the analysis directory.

For example:

MISEQ run #88 datestamped 150729
================================
Run name : 150729_M00789_0088_000000000-ABCD1
Reference: MISEQ_150729#88
Platform : MISEQ
Sequencer: MiSeq
Directory: /runs/2015/miseq/150729_M00789_0088_000000000-ABCD1_analysis
Endedness: Paired end
Bcl2fastq: bcl2fastq 2.17.1.14

1 project:
- 'AB': Abby Brown Mouse RNA-seq 4 samples (PI Carl Dover)

Additional notes/comments:
- AB: 1% PhiX spike in

Customising data that are reported: fields and templates

The data that are reported can be customised by using the --fields option of the report command.

The available fields are:

Field name

Associated value

run_id

Run ID (e.g. MISEQ_150729#88)

run_reference_id

Run reference ID (e.g. NOVASEQ6000_230419#73_SP)

run_number

Facility run number (e.g. 88)

analysis_number

Analysis number (e.g. 2)

sequencer_platform

Sequencer platform for the run (e.g. MISEQ)

sequencer_model

Sequencer model (e.g. MiSeq)

platform

Alias for sequencer_platform

datestamp

Datestamp for the run (e.g. 150729)

source

Source of the sequencing data

data_source

Alias for source

analysis_dir

Full path to the analysis directory

path

Alias for analysis_dir

project_name

Name of the project

project

Alias for project_name

user

User associated with project

PI

Principle investigator associated with the project

pi

Alias for PI

application

Application associated with the project (e.g. RNA-seq)

library_type

Alias for application

single_cell_platform

Name of the single-cell platform (e.g. ICELL8), or empty field if not a single-cell project

flow_cell_mode

The flow cell mode stored in the run metadata

organism

Organism(s) associated with the project

no_of_samples

Number of samples in the project (for 10x Genomics CellPlex data this will be the number of multiplexed samples)

#samples

Alias for no_of_samples

no_of_cells

Number of cells in the project, for single-cell projects

#cells

Alias for no_of_cells

paired_end

yes if the run was paired-end, no if it was single-end

sample_names

Comma-separated list of sample names associated with the project (for 10x Genomics CellPlex data this will be the names of the multiplexed samples)

samples

Alias for sample_names

null

Writes an empty field

Composite fields can be specified by joining two or more fields with the + character (for example, organism+library_type); the resulting value will be the (non-null) values of the individual fields separated by spaces.

To specify an alternative separator for a composite field, prefix the field with [...]: where ... is the desired delimiter (for example, [_]:project+run_id will use an underscore).

Commonly used sets of fields can be made into “templates”, which can be defined in the reporting_templates section of the auto_process.ini configuration file and then specified using the --templates option of the report command.

Note

Custom fields are only available for the --projects reporting mode.

Writing reports to a file

By default reports are written to stdout; use the --file option to send the report to a file instead. The destination can be a local file, or a remote file specified as [[USER@]HOST:]PATH.