auto_process
commands
Note
This documentation has been auto-generated from the command help
auto_process.py
implements the following commands:
info
usage: auto_process.py info [-h] [--version] [--debug] [ANALYSIS_DIR]
Print information about the analysis associated with ANALYSIS_DIR.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to the
current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--debug Turn on debugging output
setup
usage: auto_process.py setup [-h] [--version] -r RUN_NUMBER [-s SAMPLE_SHEET]
[-n ANALYSIS_NUMBER] [-f FILE]
[--fastq-dir UNALIGNED_DIR]
[--analysis-dir ANALYSIS_DIR] [--debug]
RUN_DIR
Set up automatic processing of Illumina sequencing data from RUN_DIR.
positional arguments:
RUN_DIR directory with the output from an Illumina sequencer
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-r RUN_NUMBER, --run-number RUN_NUMBER
Set facility run number (required)
-s SAMPLE_SHEET, --samplesheet SAMPLE_SHEET, --sample-sheet SAMPLE_SHEET
Copy sample sheet file from name and location
SAMPLE_SHEET (default is to look for SampleSheet.csv
inside DIR). SAMPLE_SHEET can be a local or remote
file, or a URL
-n ANALYSIS_NUMBER, --analysis-number ANALYSIS_NUMBER
Set analysis number (e.g. if reprocessing a run); will
be appended to analysis directory name if '--analysis-
dir' not supplied
-f FILE, --file FILE Additional file(s) to copy into new analysis directory
(e.g. ICELL8 well list). FILE can be a local or remote
file, or a URL
--fastq-dir UNALIGNED_DIR
Import fastq.gz files from FASTQ_DIR (which should be
a subdirectory of DIR with the same structure as that
the 'Unaligned' or 'bcl2fastq2' output directory
produced by CASAVA/bcl2fastq)
--analysis-dir ANALYSIS_DIR
Make new directory called ANALYSIS_DIR (otherwise
default is '<RUN_DIR>_analysis[<ANALYSIS_NUMBER>]')
--debug Turn on debugging output
make_fastqs
usage: auto_process.py make_fastqs [-h] [--version] [--no-save] [--debug]
[--id NAME] [--force-copy]
[--protocol {standard,mirna,icell8,icell8_atac,10x_chromium_sc,10x_atac,10x_visium,10x_multiome,10x_multiome_atac,10x_multiome_gex,parse_evercode}]
[--sample-sheet SAMPLE_SHEET]
[--lanes LANES[:OPTIONS]]
[--output-dir OUT_DIR]
[--platform PLATFORM]
[--use-bases-mask BASES_MASK]
[--bcl-converter CONVERTER]
[--no-lane-splitting]
[--use-lane-splitting]
[--find-adapters-with-sliding-window]
[--create-empty-fastqs]
[--no-create-empty-fastqs]
[--create-fastq-for-index-reads]
[--nprocessors NPROCESSORS]
[--runner RUNNER]
[--adapter ADAPTER_SEQUENCE]
[--adapter-read2 ADAPTER_SEQUENCE_READ2]
[--minimum-trimmed-read-length MINIMUM_TRIMMED_READ_LENGTH]
[--mask-short-adapter-reads MASK_SHORT_ADAPTER_READS]
[--no-adapter-trimming]
[--well-list ICELL8_WELL_LIST]
[--swap-i1-and-i2]
[--reverse-complement {i1,i2,both}]
[--10x_jobmode CELLRANGER_JOBMODE]
[--10x_localcores CELLRANGER_LOCALCORES]
[--10x_localmem CELLRANGER_LOCALMEM]
[--10x_maxjobs CELLRANGER_MAXJOBS]
[--10x_mempercore CELLRANGER_MEMPERCORE]
[--10x_jobinterval CELLRANGER_JOBINTERVAL]
[--ignore-dual-index]
[--rc-i2-override RC_I2_OVERRIDE]
[--stats-file STATS_FILE]
[--per-lane-stats-file PER_LANE_STATS_FILE]
[--no-stats]
[--barcode-analysis-dir BARCODE_ANALYSIS_DIR]
[--no-barcode-analysis] [-j NJOBS]
[-c NCORES] [-b NBATCHES] [--verbose]
[--work-dir WORKING_DIR]
[--require-bcl2fastq-version BCL2FASTQ_VERSION]
[ANALYSIS_DIR]
Generate fastq files from raw bcl files produced by Illumina sequencer.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--no-save Don't save parameter changes to the auto_process.info
file
--debug Turn on debugging output
--id NAME identifier for output files
Primary data management:
--force-copy force primary data to be copied (by default only data
on a remote system will be copied; data on a local
system will be symlinked)
General Fastq generation:
--protocol {standard,mirna,icell8,icell8_atac,10x_chromium_sc,10x_atac,10x_visium,10x_multiome,10x_multiome_atac,10x_multiome_gex,parse_evercode}
specify Fastq generation protocol depending on the
data being processed (default: 'standard')
--sample-sheet SAMPLE_SHEET
use an alternative sample sheet to the default
'custom_SampleSheet.csv' created on setup.
--lanes LANES[:OPTIONS]
define a set of lanes to group for processing. LANES
can be a single lane (e.g. '1'), a list ('1,2,3,7'), a
range ('1-3'), or a combination ('1-3,7'). Specified
lanes are processed together in a group, using OPTIONS
(if supplied). OPTIONS takes the form
'[PROTOCOL:][KEY=VALUE:[KEY=VALUE]...] (for example
--lanes=1-4:standard:trim_adapters=no)
--output-dir OUT_DIR set the directory for the output Fastqs (default:
'bcl2fastq')
--platform PLATFORM explicitly specify the sequencing platform. Only use
this if the platform cannot be identified from the
instrument name
--use-bases-mask BASES_MASK
explicitly set the bases-mask string to indicate how
each cycle should be used in the BCL to Fastq
conversion (overrides default). Set to 'auto' to
determine automatically
Bcl conversion options:
--bcl-converter CONVERTER
explicitly set BCL conversion software to use for
non-10xGenomics/non-ICELL8 runs (either 'bcl2fastq' or
'bcl-convert'; can also include a version specifier
e.g. 'bcl2fastq>=2.0'). Default: bcl2fastq>=2.20 (may
be overridden by platform-specific settings)
--no-lane-splitting don't split the output FASTQ files by lane. Default:
off (may be overridden by platform-specific settings);
turn off using --use-lane-splitting
--use-lane-splitting split the output FASTQ files by lane. Default: on (but
may be overridden by platform-specific settings); turn
off using --no-lane-splitting
--find-adapters-with-sliding-window
use sliding window algorithm to identify adapters for
trimming
--create-empty-fastqs
create empty files as placeholders for missing FASTQs
from demultiplexing step. Default: off (but may be
overridden by platform-specific settings); turn off
using --no-create-empty-fastqs. NB Fastq generation
must have finished without for this option to be
applied
--no-create-empty-fastqs
don't create empty files as placeholders for missing
FASTQs from demultiplexing step. Default: on (but may
be overridden by platform-specific settings); turn off
using --create-empty-fastqs.
--create-fastq-for-index-reads
also create FASTQs for index reads
--nprocessors NPROCESSORS
explicitly specify number of processors/cores to use
(default taken from job runner)
--runner RUNNER explicitly specify runner definition (e.g.
'GEJobRunner(-j y)')
Adapter trimming and masking:
--adapter ADAPTER_SEQUENCE
sequence of adapter to be trimmed. Specify multiple
adapters by separating them with plus sign (+). Only
used for read 1 if --adapter-read2 is also specified
(default: use adapter sequence from sample sheet)
--adapter-read2 ADAPTER_SEQUENCE_READ2
sequence of adapter to be trimmed in read 2. Specify
multiple adapters by separating them with plus sign
(+) (default: use adapter sequence from sample sheet)
--minimum-trimmed-read-length MINIMUM_TRIMMED_READ_LENGTH
Minimum read length after adapter trimming. bcl2fastq
trims the adapter from the read down to this value; if
there is more adapter match below this length then
those bases are masked not trimmed (i.e. replaced by N
rather than removed) (default: 35)
--mask-short-adapter-reads MASK_SHORT_ADAPTER_READS
minimum length of unmasked bases that a read can be
after adapter trimming; reads with fewer ACGT bases
will be completely masked with Ns (default: 22)
--no-adapter-trimming
turn off adapter trimming even if adapter sequences
are supplied
ICELL8 options (ICELL8 data only):
--well-list ICELL8_WELL_LIST
specify ICELL8 well list file
--swap-i1-and-i2 swap supplied I1 and I2 Fastqs when matching ATAC
barcodes against well list
--reverse-complement {i1,i2,both}
can be 'i1','i2', or 'both'; reverse complement the
specified indices from the well list when matching
ATAC barcodes against well list
10x Genomics data options (Cellranger*/Spaceranger):
--10x_jobmode CELLRANGER_JOBMODE
job mode to run cellranger in (default: 'local')
--10x_localcores CELLRANGER_LOCALCORES
maximum cores cellranger can request at onetime for
jobmode 'local' (ignored for other jobmodes) (default:
1)
--10x_localmem CELLRANGER_LOCALMEM
maximum total memory cellranger can request at one
time for jobmode 'local' (ignored for other jobmodes)
(in Gbs; default: 5)
--10x_maxjobs CELLRANGER_MAXJOBS
maxiumum number of concurrent jobs to run NB only used
if jobmode is not 'local' (default: 24)
--10x_mempercore CELLRANGER_MEMPERCORE
memory assumed per core (in Gbs; default: 5); NB only
used if jobmode is not 'local'
--10x_jobinterval CELLRANGER_JOBINTERVAL
how often jobs are submitted (in ms; default: 100);
only used if jobmode is not 'local'
--ignore-dual-index on a dual-indexed flowcell where the second index was
not used for the 10x sample, ignore it
10x Genomics Spaceranger options:
--rc-i2-override RC_I2_OVERRIDE
(Spaceranger only) explicitly indicate whether bases
in I2 read were emitted as reverse complement by the
sequencing workflow: set to 'true' for the Reverse
Complement Workflow (Workflow B)/ NovaSeq Reagent Kit
v1.5 or greater, 'false' for the Forward Strand
Workflow (Workflow A) / older NovaSeq Reagent Kits. If
unset then workflow will be determined automatically
(recommended)
Statistics generation:
--stats-file STATS_FILE
specify output file for fastq statistics
--per-lane-stats-file PER_LANE_STATS_FILE
specify output file for per-lane statistics
--no-stats don't generate statistics file; use
'update_fastq_stats' command to (re)generate
statistics
Barcode analysis:
--barcode-analysis-dir BARCODE_ANALYSIS_DIR
specify subdirectory where barcode analysis will be
performed and outputs will be written
--no-barcode-analysis
don't perform barcode analysis; use 'analyse_barcodes'
command to run barcode analysis separately
Job control options:
-j NJOBS, --maxjobs NJOBS
maxiumum number of jobs to run concurrently (default:
12)
-c NCORES, --maxcores NCORES
maximum number of cores available for running jobs
(default: no limit)
-b NBATCHES, --maxbatches NBATCHES
enable dynamic batching of pipeline jobs with maximum
number of batches set to NBATCHES (default: no
batching)
Advanced/debugging options:
--verbose run pipeline in 'verbose' mode
--work-dir WORKING_DIR
specify the working directory for the pipeline
operations
Deprecated options:
--require-bcl2fastq-version BCL2FASTQ_VERSION
deprecated: explicitly specify version of bcl2fastq
software to use (e.g. '=1.8.4' or '>=2.0') (use --bcl-
converter instead)
analyse_barcodes
usage: auto_process.py analyse_barcodes [-h] [--version]
[--unaligned-dir UNALIGNED_DIR]
[--lanes LANES]
[--mismatches MISMATCHES]
[--cutoff CUTOFF]
[--sample-sheet SAMPLE_SHEET]
[--id NAME]
[--barcode-analysis-dir BARCODE_ANALYSIS_DIR]
[--force] [--runner RUNNER] [--debug]
[ANALYSIS_DIR]
Analyse barcode sequences for Fastq files in specified lanes in ANALYSIS_DIR,
and report the most common barcodes found across all reads from each lane.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--unaligned-dir UNALIGNED_DIR
explicitly set the (sub)directory with bcl-to-fastq
outputs
--lanes LANES specify which lanes to analyse barcodes for (default
is to do analysis for all lanes).
--mismatches MISMATCHES
maximum number of mismatches to use when grouping
similar barcodes (default is to determine
automatically from the bases mask)
--cutoff CUTOFF exclude barcodes with a smaller fraction of associated
reads than CUTOFF, e.g. '0.01' excludes barcodes with
< 1% of reads (default is 0.01%)
--sample-sheet SAMPLE_SHEET
use an alternative sample sheet to the default
'custom_SampleSheet.csv' created on setup.
--id NAME specify an identifier to be written into the default
output barcode analysis directory name (e.g.
'barcode_analysis_NAME') and report title
--barcode-analysis-dir BARCODE_ANALYSIS_DIR
specify subdirectory where barcode analysis will be
performed and outputs will be written
--force discard and regenerate counts (by default existing
counts will be used)
--runner RUNNER explicitly specify runner definition (e.g.
'GEJobRunner(-j y)')
--debug Turn on debugging output
setup_analysis_dirs
usage: auto_process.py setup_analysis_dirs [-h] [--version]
[--ignore-missing-metadata]
[--unaligned-dir UNALIGNED_DIR]
[--undetermined UNDETERMINED]
[--short-fastq-names]
[--link-to-fastqs] [--id NAME]
[--debug]
[ANALYSIS_DIR]
Create analysis subdirectories for projects defined in projects.info file in
ANALYSIS_DIR.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--ignore-missing-metadata
force creation of project directories even if metadata
is not set (default is to fail if metadata is missing)
--unaligned-dir UNALIGNED_DIR
explicitly specify the subdirectory with output Fastqs
--undetermined UNDETERMINED
explicitly specify name for project directory with
'undetermined' fastqs
--short-fastq-names shorten fastq file names when copying or linking from
project directory (default is to keep long names from
bcl2fastq)
--link-to-fastqs create symbolic links to original fastqs from project
directory (default is to make hard links)
--id NAME identifier to append to project names
--debug Turn on debugging output
run_qc
usage: auto_process.py run_qc [-h] [--version] [--projects PROJECT_PATTERN]
[--qc_dir QC_DIR] [--fastq_dir FASTQ_DIR]
[--fastq_subset SUBSET] [-t NTHREADS]
[--cellranger CELLRANGER_EXE]
[--10x_chemistry {ARC-v1,SC3Pv1,SC3Pv2,SC3Pv3,SC5P-PE,SC5P-R2,auto,fiveprime,threeprime}]
[--10x_force_cells N_CELLS]
[--10x_extra_projects PROJECT_DIRS]
[--10x_transcriptome ORGANISM=REFERENCE]
[--10x_premrna_reference ORGANISM=REFERENCE]
[--report HTML_FILE] [--enable-conda {yes,no}]
[--conda-env-dir CONDA_ENV_DIR] [-c NCORES]
[-j NJOBS] [-b NBATCHES] [--verbose]
[--work-dir WORKING_DIR] [--runner RUNNER]
[--debug]
[ANALYSIS_DIR]
Run QC procedures for sequencing projects in ANALYSIS_DIR.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--projects PROJECT_PATTERN
simple wildcard-based pattern specifying a subset of
projects and samples to run the QC on. PROJECT_PATTERN
should be of the form 'pname[/sname]', where 'pname'
specifies a project (or set of projects) and 'sname'
optionally specifies a sample (or set of samples).
--qc_dir QC_DIR explicitly specify QC output directory (nb if supplied
then the same QC_DIR will be used for each project.
Non-absolute paths are assumed to be relative to the
project directory). Default: 'qc'
--fastq_dir FASTQ_DIR
explicitly specify subdirectory of DIR with Fastq
files to run the QC on.
QC options:
--fastq_subset SUBSET
specify size of subset of total reads to use for
fastq_screen, BAM file generation etc (default 100000,
set to 0 to use all reads)
-t NTHREADS, --threads NTHREADS
number of threads to use for QC script (default: taken
from job runner)
Cellranger/10xGenomics options:
--cellranger CELLRANGER_EXE
explicitly specify path to Cellranger executable to
use for single library analysis (NB will be used for
all projects)
--10x_chemistry {ARC-v1,SC3Pv1,SC3Pv2,SC3Pv3,SC5P-PE,SC5P-R2,auto,fiveprime,threeprime}
assay configuration for 10xGenomics scRNA-seq; if set
to 'auto' (the default) then cellranger will attempt
to determine this automatically
--10x_force_cells N_CELLS
force number of cells for 10xGenomics scRNA-seq and
scATAC-seq, overriding automatic cell detection
algorithms (default is to use built-in cell detection)
--10x_extra_projects PROJECT_DIRS
specify additional projects to include samples from in
single library analyses, as comma-separated list
--10x_transcriptome ORGANISM=REFERENCE
specify cellranger transcriptome reference datasets to
associate with organisms (overrides references defined
in config file)
--10x_premrna_reference ORGANISM=REFERENCE
specify cellranger pre-mRNA reference datasets to
associate with organisms (overrides references defined
in config file)
Output and reporting:
--report HTML_FILE file name for output HTML QC report (default:
<QC_DIR>_report.html)
Conda dependency resolution:
--enable-conda {yes,no}
use conda to resolve task dependencies; can be 'yes'
or 'no' (default: no)
--conda-env-dir CONDA_ENV_DIR
specify directory for conda enviroments (default:
temporary directory)
Job control options:
-c NCORES, --maxcores NCORES
maximum number of cores available for running jobs
(default: no limit)
-j NJOBS, --maxjobs NJOBS
maxiumum number of jobs to run concurrently (default:
12)
-b NBATCHES, --maxbatches NBATCHES
enable dynamic batching of pipeline jobs with maximum
number of batches set to NBATCHES (default: no
batching)
Advanced/debugging options:
--verbose run pipeline in 'verbose' mode
--work-dir WORKING_DIR
specify the working directory for the pipeline
operations
--runner RUNNER explicitly specify runner definition (e.g.
'GEJobRunner(-j y)')
--debug Turn on debugging output
publish_qc
usage: auto_process.py publish_qc [-h] [--version] [--qc_dir QC_DIR]
[--use-hierarchy {yes,no}] [--url BASE_URL]
[--projects PROJECT_PATTERN]
[--ignore-missing-qc]
[--exclude-zip-files {yes,no}]
[--regenerate-reports] [--force]
[--suppress-warnings] [--legacy]
[--runner RUNNER] [--debug]
[ANALYSIS_DIR]
Copy QC reports from ANALYSIS_DIR to local or remote directory (e.g. web
server). By default existing QC reports will be copied without further
checking; if no report is found then QC results will be verified and a report
generated first.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
Destination options:
--qc_dir QC_DIR specify target directory to copy QC reports to. QC_DIR
can be a local directory, or a remote location in the
form '[[user@]host:]directory'. Overrides the default
settings.
--use-hierarchy {yes,no}
use YEAR/PLATFORM hierarchy under QC_DIR; can be 'yes'
or 'no' (default: no)
--url BASE_URL specify the 'base' URL for accessing the published
reports. Overrides the default settings
Projects and data options:
--projects PROJECT_PATTERN
simple wildcard-based pattern specifying a subset of
projects and samples to publish the QC for.
PROJECT_PATTERN can specify a single project, or a set
of projects.
--ignore-missing-qc skip projects where QC results are missing or can't be
verified, or where reports can't be generated.
--exclude-zip-files {yes,no}
exclude ZIP archives from publication; can be 'yes' or
'no' (default: no)
QC reporting options:
--regenerate-reports attempt to regenerate existing QC reports
--force force generation of QC reports for all projects even
if verification has failed
--suppress-warnings don't include warning messages in (re)generated QC
reports or top level index even if there are missing
metrics in individual QC reports (NB won't be applied
for pre-existing reports; combine with --regenerate-
reports and --force to update all reports)
--legacy legacy mode: include links to MultiQC, 'cellranger
count' and ICELL8 reports in the top-level index page
Advanced/debugging options:
--runner RUNNER explicitly specify runner definition (e.g.
'GEJobRunner(-j y)')
--debug Turn on debugging output
archive
usage: auto_process.py archive [-h] [--version] [--archive_dir ARCHIVE_DIR]
[--platform PLATFORM] [--year YEAR]
[--group GROUP] [--chmod PERMISSIONS] [--final]
[--force] [--runner RUNNER] [--dry-run]
[--debug]
[ANALYSIS_DIR]
Copy sequencing analysis data directory ANALYSIS_DIR to 'archive' destination.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--archive_dir ARCHIVE_DIR
specify top-level archive directory to copy data
under. ARCHIVE_DIR can be a local directory, or a
remote location in the form '[[user@]host:]directory'.
Overrides the default settings.
--platform PLATFORM specify the platform e.g. 'hiseq', 'miseq' etc
(overrides automatically determined platform, if any).
Use 'other' for cases where the platform is unknown.
--year YEAR specify the year e.g. '2014' (default is the current
year)
--group GROUP specify the name of group for the archived files
(default: None)
--chmod PERMISSIONS specify permissions for the archived files.
PERMISSIONS should be a string recognised by the
'chmod' command (e.g. 'o-rwX') (default: None)
--final copy data to final archive location (default is to
copy to staging area)
--force attempt to complete archiving operations ignoring any
errors (e.g. key metadata items not set, unable to set
group etc)
--runner RUNNER explicitly specify runner definition (e.g.
'GEJobRunner(-j y)')
--dry-run Dry run i.e. report what would be done but don't
perform any actions
--debug Turn on debugging output
report
usage: auto_process.py report [-h] [--version]
[--logging | --summary | --projects]
[--fields FIELDS] [--template TEMPLATE]
[--file OUT_FILE] [--debug]
[ANALYSIS_DIR]
Report information on analysis in ANALYSIS_DIR.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--logging print short report suitable for logging file
--summary print full report suitable for bioinformaticians
--projects print tab-delimited line (one per project) suitable for
injection into a spreadsheet
--fields FIELDS fields to report
--template TEMPLATE name of template with fields to report (templates
should be defined in the config file)
--file OUT_FILE write report to OUT_FILE; destination can be a local
file, or a remote file specified as [[USER@]HOST:]PATH
(default is to write to stdout)
--debug Turn on debugging output
samplesheet
usage: auto_process.py samplesheet [-h] [--version]
[--use SAMPLE_SHEET | --set-project [LANES:][COL=PATTERN:]NEW_PROJECT | --set-sample-id [LANES:][COL=PATTERN:]NEW_ID | --set-sample-name NEW_NAME | -i SAMPLE_SHEET | -e | -p]
[--debug]
[ANALYSIS_DIR]
Query and manipulate sample sheets
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--use SAMPLE_SHEET update the default sample sheet file to SAMPLE_SHEET
(must be a file on the local file system)
--set-project [LANES:][COL=PATTERN:]NEW_PROJECT
update the sample project field. Optional LANES
specifies one or more lanes (e.g. '1', '1,2,3', '1-3',
'1,3-5') to update; optional COL=PATTERN specifies a
glob-style pattern to match to an arbitrary column
(e.g. 'Sample_Name=ITS*'); NEW_PROJECT is the new
project name
--set-sample-id [LANES:][COL=PATTERN:]NEW_ID
update the sample ID field.Optional LANES specifies
one or more lanes (e.g. '1', '1,2,3', '1-3', '1,3-5')
to update; optional COL=PATTERN specifies a glob-style
pattern to match to an arbitrary column (e.g.
'Sample_Name=ITS*'); NEW_ID can be either
'SAMPLE_NAME' or an arbitrary string
--set-sample-name NEW_NAME
update the sample name field.Optional LANES specifies
one or more lanes (e.g. '1', '1,2,3', '1-3', '1,3-5')
to update; optional COL=PATTERN specifies a glob-style
pattern to match to an arbitrary column (e.g.
'Sample_Name=ITS*'); NEW_NAME can be either
'SAMPLE_ID' or an arbitrary string
-i SAMPLE_SHEET, --import SAMPLE_SHEET
replace existing sample sheet file with version copied
from the specified location; SAMPLE_SHEET can be a
local or remote file, or a URL
-e, --edit bring up sample sheet file in an editor to make
changes manually
-p, --predict show predicted outputs from sample sheet
Advanced options:
--debug Turn on debugging output
update
usage: auto_process.py update [-h] [--version] [--debug] [ANALYSIS_DIR]
Update paths and metadata across ANALYSIS_DIR and its projects and QC outputs
when directory has been moved or copied, or project metadata has been updated.
positional arguments:
ANALYSIS_DIR existing auto_process analysis directory to update (optional:
defaults to the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--debug Turn on debugging output
merge_fastq_dirs
usage: auto_process.py merge_fastq_dirs [-h] [--version]
[--primary-unaligned-dir UNALIGNED_DIR]
[--output-dir OUTPUT_DIR] [--dry-run]
[--debug]
[ANALYSIS_DIR]
Automatically merge fastq directories from multiple bcl-to-fastq runs within
ANALYSIS_DIR. Use this command if 'make_fastqs' step was run multiple times to
process subsets of lanes.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--primary-unaligned-dir UNALIGNED_DIR
merge fastqs from additional bcl-to-fastq directories
into UNALIGNED_DIR. Original data will be moved out of
the way first. Defaults to 'bcl2fastq'.
--output-dir OUTPUT_DIR
merge fastqs into OUTPUT_DIR (relative to
ANALYSIS_DIR). Defaults to UNALIGNED_DIR.
--dry-run Dry run i.e. report what would be done but don't
perform any actions
--debug Turn on debugging output
update_fastq_stats
usage: auto_process.py update_fastq_stats [-h] [--version]
[--unaligned-dir UNALIGNED_DIR]
[--sample-sheet SAMPLE_SHEET]
[--id NAME]
[--stats-file STATS_FILE]
[--per-lane-stats-file PER_LANE_STATS_FILE]
[-a] [--force]
[--nprocessors NPROCESSORS]
[--runner RUNNER] [--debug]
[ANALYSIS_DIR]
(Re)generate statistics for fastq files produced from 'make_fastqs'.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--unaligned-dir UNALIGNED_DIR
explicitly set the (sub)directory with bcl-to-fastq
outputs
--sample-sheet SAMPLE_SHEET
explicitly specify the sample sheet to use (defaults
to the sample sheet stored in the analysis directory
parameters)
--id NAME specify an identifier to be written into the output
statistics file name (e.g. 'statistics.NAME.info')
--stats-file STATS_FILE
specify output file for fastq statistics
--per-lane-stats-file PER_LANE_STATS_FILE
specify output file for per-lane statistics
-a, --add add new data from UNALIGNED_DIR to existing statistics
--force force statistics to be regenerated even if existing
statistics files are newer than fastqs
--nprocessors NPROCESSORS
explicitly specify number of processors/cores to use
(default taken from job runner)
--runner RUNNER explicitly specify runner definition (e.g.
'GEJobRunner(-j y)')
--debug Turn on debugging output
import_project
usage: auto_process.py import_project [-h] [--version] [--debug]
[--comment COMMENT]
[ANALYSIS_DIR] PROJECT_DIR
Copy a project directory PROJECT_DIR from another analysis directory into
ANALYSIS_DIR, update metadata appropriately, and regenerate QC reports.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
PROJECT_DIR path to project directory to import
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--debug Turn on debugging output
--comment COMMENT specify comment text to be appended to the stored
comments associated with the project
config
usage: auto_process.py config [-h] [--version] [--debug]
[--init | --set KEY_VALUE | --add NEW_SECTION]
[--raw] [--show]
Query and change global configuration. Run without options arguments to
displays configuration settings.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--debug Turn on debugging output
Creation and edit options:
--init Create a new default configuration file based on the
sample template.
--set KEY_VALUE Set the value of a parameter. KEY_VALUE should be of the
form '<param>=<value>' (<param> should be of the form
'SECTION[:SUBSECTION].NAME'). Multiple --set options can
be specified.
--add NEW_SECTION Add a new section called NEW_SECTION to the config (to
add e.g. a new platform, use 'platform:NAME'). Multiple
--add options can be specified.
Display options:
--raw Show the 'raw' configuration (i.e. only parameters and
values explicitly defined in the config before defaults
are loaded)
Deprecated/defunct options:
--show Show the values of parameters and settings (does nothing;
use 'config' with no options to display settings)
params
usage: auto_process.py params [-h] [--version] [--set KEY_VALUE] [--debug]
[ANALYSIS_DIR]
Query and change processing parameters and settings for ANALYSIS_DIR.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to the
current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--set KEY_VALUE Set the value of a parameter. KEY_VALUE should be of the
form '<param>=<value>'. Multiple --set options can be
specified.
--debug Turn on debugging output
metadata
usage: auto_process.py metadata [-h] [--version] [--set KEY_VALUE] [--update]
[--debug]
[ANALYSIS_DIR]
Query and change metadata associated with ANALYSIS_DIR.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to the
current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--set KEY_VALUE Set the value of a metadata item. KEY_VALUE should be of
the form '<param>=<value>'. Multiple --set options can be
specified.
--update Automatically update metadata items where possible (e.g.
for older analyses which have old or missing metadata
files)
--debug Turn on debugging output
readme
usage: auto_process.py readme [-h] [--version] [--init] [-V] [-e] [-m MESSAGE]
[--debug]
[ANALYSIS_DIR]
Add or amend a README file in the analysis directory DIR.
positional arguments:
ANALYSIS_DIR auto_process analysis directory (optional: defaults to
the current directory)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--init create a new README file
-V, --view display the contents of the README file
-e, --edit bring up README file in an editor to make changes
-m MESSAGE, --message MESSAGE
append MESSAGE text to the README file
--debug Turn on debugging output
clone
usage: auto_process.py clone [-h] [--version] [--copy-fastqs]
[--exclude-projects] [--debug]
[ANALYSIS_DIR] CLONE_DIR
Make a copy of an existing directory DIR in a new directory CLONE_DIR.
positional arguments:
ANALYSIS_DIR existing auto_process analysis directory to clone
(optional: defaults to the current directory)
CLONE_DIR path to cloned directory
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--copy-fastqs Copy fastq.gz files from DIR into CLONE_DIR (default is
to make a link to the bcl-to-fastq directory)
--exclude-projects Exclude (i.e. don't copy) project directories from DIR
--debug Turn on debugging output