Utilities
Note
This documentation has been auto-generated from the command help
In addition to the main auto_process.py
command, a number of utilities
are available:
analyse_barcodes.py
usage:
analyse_barcodes.py FASTQ [FASTQ...]
analyse_barcodes.py DIR
analyse_barcodes.py -c COUNTS_FILE [COUNTS_FILE...]
Collate and report counts and statistics for Fastq index sequences (aka
barcodes). If multiple Fastq files are supplied then sequences will be pooled
before being analysed. If a single directory is supplied then this will be
assumed to be an output directory from bcl2fastq and files will be processed
on a per-lane basis. If the -c option is supplied then the input must be one
or more file of barcode counts generated previously using the -o option.
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
Input and output options:
-c, --counts input is one or more counts files generated by
previous runs using the '-o/--output' option
-o COUNTS_FILE_OUT, --output COUNTS_FILE_OUT
output all counts to tab-delimited file
COUNTS_FILE_OUT. This can be used again in another run
by specifying the '-c' option
Reporting options:
-l LANES, --lanes LANES
restrict analysis to the specified lane numbers
(default is to process all lanes). Multiple lanes can
be specified using ranges (e.g. '2-3'), comma-
separated list ('5,7') or a mixture ('2-3,5,7')
-m MISMATCHES, --mismatches MISMATCHES
maximum number of mismatches to use when grouping
similar barcodes (will be determined automatically if
samplesheet is supplied, otherwise defaults to 0)
--cutoff CUTOFF exclude barcodes/barcode groups from reporting with a
smaller fraction of associated reads than CUTOFF, e.g.
'0.01' excludes barcodes with < 1.0% of reads
(default: 0.001)
-s SAMPLE_SHEET, --sample-sheet SAMPLE_SHEET
report best matches against barcodes in SAMPLE_SHEET
-r REPORT_FILE, --report REPORT_FILE
write report to REPORT_FILE (otherwise write to
stdout)
-x XLS_FILE, --xls XLS_FILE
write XLS version of report to XLS_FILE
-f HTML_FILE, --html HTML_FILE
write HTML version of report to HTML_FILE
-t TITLE, --title TITLE
title for HTML report (default: 'Barcodes Report')
-n, --no-report suppress reporting (overrides --report)
Advanced options:
--minimum_read_fraction FRACTION
weed out individual barcodes from initial analysis
which have a smaller fraction of reads than FRACTION,
e.g. '0.001' removes barcodes with < 0.1% of reads;
speeds up analysis at the expense of accuracy as
reported counts will be approximate (default: 1.0e-6)
assign_barcodes.py
usage: assign_barcodes.py [-h] [-n N] INPUT.fq OUTPUT.fq
Extract arbitrary sequence fragments from reads in INPUT.fq FASTQ file and
assign these as the index (barcode) sequences in the read headers in
OUTPUT.fq.
positional arguments:
INPUT.fq Input FASTQ file
OUTPUT.fq Output FASTQ file
optional arguments:
-h, --help show this help message and exit
-n N remove first N bases from each read and assign these as barcode
index sequence (default: 5)
audit_projects.py
usage: audit_projects.py [-h] [--pi PI_NAME] [--unassigned] [DIR [DIR ...]]
Summarise the disk usage for runs that have been processed using auto_process.
The supplied DIRs are directories holding one or more top-level analysis
directories corresponding to different runs. The program reports total disk
usage for projects assigned to each PI across all DIRs.
positional arguments:
DIR directory to search for analysis directories for auditing
optional arguments:
-h, --help show this help message and exit
--pi PI_NAME list data for PI(s) matching PI_NAME (can use glob-style
patterns)
--unassigned list data for projects where PI is not assigned
build_index.py
usage: build_index.py [-h] [-v] [-o OUT_DIR] [--ebwt_base NAME]
[--bt2_base NAME] [--overhang N] [-V VERSION]
[-r RUNNER]
ALIGNER FASTA [ANNOTATION]
Generate indexes for aligners
positional arguments:
ALIGNER aligner to build index for (one of 'bowtie',
'bowtie2', 'star')
FASTA FASTA file with sequence
ANNOTATION annotation file (for use with STAR)
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-o OUT_DIR output directory for indexes
Bowtie-specific options:
--ebwt_base NAME specify basename for output .ebwt files (defaults to
FASTA file basename)
Bowtie-specific options:
--bt2_base NAME specify basename for output .bt2 files (defaults to
FASTA file basename)
STAR-specific options:
--overhang N set value for STAR --sjdbOverhang option (default:
100)
Advanced options:
-V VERSION, --aligner-version VERSION
specify the version of the aligner to target (only
works if conda dependency resolution is configured)
-r RUNNER, --runner RUNNER
explicitly specify runner definition for building the
index. RUNNER must be a valid job runner specification
e.g. 'GEJobRunner(-pe smp.pe 8)' (default: use
appropriate runner from configuration)
concat_fastqs.py
usage: concat_fastqs.py [-h] [--version] [-v] FASTQ [FASTQ ...] FASTQ_OUT
Concatenate reads from one or more input Fastq files into a single new file
FASTQ_OUT
positional arguments:
FASTQ Input FASTQ to concatenate
FASTQ_OUT Output FASTQ with concatenated reads
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-v, --verbose verbose output
barcode_splitter.py
usage:
barcode_splitter.py [OPTIONS] FASTQ [FASTQ ...]
barcode_splitter.py [OPTIONS] FASTQ_R1,FASTQ_R2 [FASTQ_R1,FASTQ_R2 ...]
barcode_splitter.py [OPTIONS] DIR
Split reads from one or more input Fastq files into new Fastqs based on
matching supplied barcodes.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-b INDEX_SEQ, --barcode INDEX_SEQ
specify index sequence to filter using
-m N_MISMATCHES, --mismatches N_MISMATCHES
maximum number of differing bases to allow for two
index sequences to count as a match. Default is zero
(i.e. exact matches only)
-n BASE_NAME, --name BASE_NAME
basename to use for output files
-o OUT_DIR, --output-dir OUT_DIR
specify directory for output split Fastqs
-u UNALIGNED_DIR, --unaligned UNALIGNED_DIR
specify subdirectory with outputs from bcl2fastq
-l LANE, --lane LANE specify lane to collect and split Fastqs for
download_fastqs.py
usage: download_fastqs.py [-h] URL [DIR]
Download checksum file and fastqs from URL into current directory (or
directory DIR, if specified), and verify the downloaded files against the
checksum file.
positional arguments:
URL URL with checksum file and fastqs
DIR directory to put downloaded fastqs into (defaults to current
directory)
optional arguments:
-h, --help show this help message and exit
demultiplex_icell8_atac.py
usage: demultiplex_icell8_atac.py [-h] [-v] [-o OUTDIR] [-b N] [-n N]
[-m {samples,barcodes}] [--unassigned NAME]
[--swap-i1-i2]
[--reverse-complement {i1,i2,both}] [-u]
[--no-demultiplexing]
WELL_LIST FASTQ_R1 FASTQ_R2 FASTQ_I1
FASTQ_I2
Assign reads from ICELL8 ATAC R1/R2/I1/I2 Fastq set to barcodes and samples in
a well list file
positional arguments:
WELL_LIST Well list file
FASTQ_R1 FASTQ R1
FASTQ_R2 FASTQ R2
FASTQ_I1 FASTQ I1
FASTQ_I2 FASTQ I2
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-o OUTDIR, --output-dir OUTDIR
path to demultiplexed output
-b N, --batch_size N batch size for splitting index read Fastqs (default:
50000000)
-n N, --nprocessors N
number of processors to use
-m {samples,barcodes}, --mode {samples,barcodes}
demultiplex reads by sample (default) or by barcode
--unassigned NAME basename for output Fastqs with reads which cannot be
assigned to any sample or barcode (default:
'Unassigned')
--swap-i1-i2 swap supplied I1 and I2 Fastqs
--reverse-complement {i1,i2,both}
reverse complement one or both of the indices from the
well list file
-u, --update-read-headers
update read headers in the output Fastqs to include
the matching index sequence (i.e. barcode) from the
well list file
--no-demultiplexing don't generate demultiplexed Fastqs (only the stats)
fastq_statistics.py
usage: fastq_statistics.py [-h] [-v] [--unaligned UNALIGNED_DIR]
[--sample-sheet SAMPLE_SHEET] [-o STATS_FILE]
[-p PER_LANE_STATS_FILE]
[-s PER_LANE_SAMPLE_STATS_FILE]
[-f FULL_STATS_FILE] [-u] [-n N] [--debug]
[--force]
ILLUMINA_RUN_DIR
Generate statistics for FASTQ files in ILLUMINA_RUN_DIR (top-level directory
of a processed Illumina run)
positional arguments:
ILLUMINA_RUN_DIR input Illumina run directory
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--unaligned UNALIGNED_DIR
specify an alternative name for the 'Unaligned'
directory containing the fastq.gz files
--sample-sheet SAMPLE_SHEET
specify a sample sheet file to get additional
information from
-o STATS_FILE, --output STATS_FILE
name of output file for per-file statistics (default
is 'statistics.info')
-p PER_LANE_STATS_FILE, --per-lane-stats PER_LANE_STATS_FILE
name of output file for per-lane statistics (default
is 'per_lane_statistics.info')
-s PER_LANE_SAMPLE_STATS_FILE, --per-lane-sample-stats PER_LANE_SAMPLE_STATS_FILE
name of output file for per-lane statistics (default
is 'per_lane_sample_stats.info')
-f FULL_STATS_FILE, --full-stats FULL_STATS_FILE
name of output file for full statistics (default is
'statistics_full.info')
-u, --update update existing full statistics file with stats for
additional files
-n N, --nprocessors N
spread work across N processors/cores (default is 1)
--debug turn on debugging output
Deprecated/defunct options:
--force does nothing: retained for backwards compatibility
icell8_contamination_filter.py
usage: icell8_contamination_filter.py [-h] [-o OUT_DIR] [-m MAMMALIAN_CONF]
[-c CONTAMINANTS_CONF]
[-a {bowtie,bowtie2}] [-n THREADS]
FQ_R1 FQ_R2
positional arguments:
FQ_R1 R1 FASTQ file
FQ_R2 Matching R2 FASTQ file
optional arguments:
-h, --help show this help message and exit
-o OUT_DIR, --outdir OUT_DIR
directory to write output FASTQ files to (default:
current directory)
-m MAMMALIAN_CONF, --mammalian MAMMALIAN_CONF
fastq_screen 'conf' file with the mammalian genome
indices
-c CONTAMINANTS_CONF, --contaminant CONTAMINANTS_CONF
fastq_screen 'conf' file with the contaminant genome
indices
-a {bowtie,bowtie2}, --aligner {bowtie,bowtie2}
aligner to use with fastq_screen (default: don't
specify an aligner)
-n THREADS, --threads THREADS
number of threads to run fastq_screen with (default:
1)
icell8_report.py
usage: icell8_report.py [-h] [-s STATS_FILE] [-o OUT_FILE] [-n NAME] [DIR]
positional arguments:
DIR directory with ICell8 processing outputs
optional arguments:
-h, --help show this help message and exit
-s STATS_FILE, --stats_file STATS_FILE
ICell8 stats file (default:
DIR/stats/icell8_stats.tsv)
-o OUT_FILE, --out_file OUT_FILE
Output HTML file (default: 'icell8_processing.html')
-n NAME, --name NAME specify a string to append to the zip archive name and
prefix
icell8_stats.py
[2024/05/01-09:13:10] ICell8 stats started
usage: icell8_stats.py [-h] [-w WELL_LIST_FILE] [-u] [-f STATS_FILE] [-a]
[-s SUFFIX] [-n NPROCESSORS] [-m MAX_BATCH_SIZE]
[-T DIR]
[FASTQ_R1 FASTQ_R2 [FASTQ_R1 FASTQ_R2 ...]]
positional arguments:
FASTQ_R1 FASTQ_R2 FASTQ file pairs
optional arguments:
-h, --help show this help message and exit
-w WELL_LIST_FILE, --well-list WELL_LIST_FILE
iCell8 'well list' file
-u, --unassigned include 'unassigned' reads
-f STATS_FILE, --stats-file STATS_FILE
output statistics file
-a, --append append to statistics file
-s SUFFIX, --suffix SUFFIX
suffix to attach to column names
-n NPROCESSORS, --nprocessors NPROCESSORS
number of processors/cores available for statistics
generation (default: 1)
-m MAX_BATCH_SIZE, --max-batch-size MAX_BATCH_SIZE
maximum number of reads per batch when dividing Fastqs
(multicore only; default: 100000000)
-T DIR, --temporary-directory DIR
use DIR for temporaries, not $TMPDIR or /tmp
manage_fastqs.py
usage:
manage_fastqs.py DIR
manage_fastqs.py DIR PROJECT
manage_fastqs.py DIR PROJECT copy [[user@]host:]DEST
manage_fastqs.py DIR PROJECT md5
manage_fastqs.py DIR PROJECT zip
Fastq management utility. If only DIR is supplied then list the projects; if
PROJECT is supplied then list the fastqs; 'copy' command copies fastqs for the
specified PROJECT to DEST on a local or remote server; 'md5' command generates
checksums for the fastqs; 'zip' command creates a zip file with the fastq
files.
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--filter PATTERN filter file names for reporting and copying based on
PATTERN
--fastq_dir FASTQ_DIR
explicitly specify subdirectory of DIR with Fastq
files to run the QC on
--max_zip_size MAX_ZIP_SIZE
for 'zip' command, defines the maximum size for the
output zip file; multiple zip files will be created if
the data exceeds this limit (default is create a
single zip file with no size limit)
--link hard link files instead of copying
process_icell8.py
usage: process_icell8.py [-h] [-u UNALIGNED_DIR] [-p NAME] [-o OUTDIR]
[-m MAMMALIAN_CONF] [-c CONTAMINANTS_CONF] [-q]
[-a {bowtie,bowtie2}] [-r STAGE=RUNNER] [-n STAGE=N]
[-s BATCH_SIZE] [-j MAX_JOBS]
[--no-contaminant-filter] [--no-cleanup] [--force]
[-v] [--no-quality-filter] [--threads THREADS]
WELL_LIST [FASTQ_R1 FASTQ_R2 [FASTQ_R1 FASTQ_R2 ...]]
Perform initial QC on FASTQs from Wafergen ICell8: assign to barcodes, filter
on barcode & UMI quality, trim reads, perform contaminant filtering and split
by barcode.
positional arguments:
WELL_LIST Well list file
FASTQ_R1 FASTQ_R2 FASTQ file pairs
optional arguments:
-h, --help show this help message and exit
-u UNALIGNED_DIR, --unaligned UNALIGNED_DIR
process FASTQs from 'unaligned' dir with output from
bcl2fastq (NB cannot be used with -p option)
-p NAME, --project NAME
process FASTQS from project directory NAME (NB if -o
not specified then this will also be used as the
output directory; cannot be used with -u option)
-o OUTDIR, --outdir OUTDIR
directory to write outputs to (default: 'CWD/icell8',
or project dir if -p is specified)
-m MAMMALIAN_CONF, --mammalian MAMMALIAN_CONF
fastq_screen 'conf' file with the 'mammalian' genome
indices (default: None)
-c CONTAMINANTS_CONF, --contaminants CONTAMINANTS_CONF
fastq_screen 'conf' file with the 'contaminant' genome
indices (default: None)
-q, --quality-filter filter out read pairs with low quality barcode and UMI
sequences (not recommended for NextSeq data)
-a {bowtie,bowtie2}, --aligner {bowtie,bowtie2}
aligner to use with fastq_screen (default: don't
specify the aligner)
-r STAGE=RUNNER, --runner STAGE=RUNNER
explicitly specify runner definitions for running
pipeline jobs at each stage. STAGE can be one of 'defa
ult','contaminant_filter','qc','statistics','report'.
If STAGE is not specified then it is assumed to be
'default'. RUNNER must be a valid job runner
specification e.g. 'GEJobRunner(-j y)'. Multiple
--runner arguments can be specified (default:
'SimpleJobRunner(join_logs=True)')
-n STAGE=N, --nprocessors STAGE=N
specify number of processors to use at each stage.
STAGE can be one of 'default','contaminant_filter','qc
','statistics','report'. If STAGE is not specified
then it is assumed to be 'default'. Multiple
--nprocessors arguments can be specified (default: 1)
-s BATCH_SIZE, --size BATCH_SIZE
number of reads per batch when splitting FASTQ files
for processing (default: 5000000)
-j MAX_JOBS, --max-jobs MAX_JOBS
maxiumum number of concurrent jobs to run (default:
12)
--no-contaminant-filter
don't perform contaminant filter step (default is to
do contaminant filtering)
--no-cleanup don't remove intermediate Fastq files (default is to
delete intermediate Fastqs once no longer needed)
--force force overwrite of existing outputs
-v, --verbose produce verbose output for diagnostics
--no-quality-filter deprecated: kept for backwards compatibility only as
barcode/UMI quality checks are now disabled by default
--threads THREADS deprecated (use -n/--nprocessors option instead):
number of threads to use with multicore tasks (e.g.
'contaminant_filter')
run_qc.py
usage: run_qc.py [-h] [--version] [--info] [-n NAME] [-o OUT_DIR]
[--qc_dir QC_DIR] [-f FILENAME] [-u] [--organism ORGANISM]
[--library-type LIBRARY] [--single-cell-platform PLATFORM]
[-p PROTOCOL] [--fastq_subset SUBSET] [-t NTHREADS]
[--star-index INDEX] [--gtf GTF]
[--cellranger CELLRANGER_EXE]
[--cellranger-reference REFERENCE]
[--10x_chemistry {ARC-v1,SC3Pv1,SC3Pv2,SC3Pv3,SC5P-PE,SC5P-R2,auto,fiveprime,threeprime}]
[--10x_force_cells N_CELLS] [--enable-conda {yes,no}]
[--conda-env-dir CONDA_ENV_DIR] [--local] [-c N] [-m M]
[-j N] [-b NBATCHES] [-r RUNNER] [-s N] [--ignore-metadata]
[--split-fastqs-by-lane] [--use-legacy-screen-names {yes,no}]
[--no-multiqc] [--verbose] [--work-dir WORKING_DIR]
[--no-cleanup] [--fastq_screen_subset SUBSET] [--force]
[--multiqc]
DIR | FASTQ [FASTQ ...] [DIR | FASTQ [FASTQ ...] ...]
Run the QC pipeline standalone on an arbitrary set of Fastq files.
positional arguments:
DIR | FASTQ [ FASTQ ... ]
directory or list of Fastq files to run the QC on
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--info display information on protocols, organisms and other
settings (then exit)
Output and reporting:
-n NAME, --name NAME name for the project (used in report title)
-o OUT_DIR, --out_dir OUT_DIR
top-level directory for reports and QC output
subdirectory (default: current working directory)
--qc_dir QC_DIR explicitly specify QC output directory. NB if a
relative path is supplied then it's assumed to be a
subdirectory of OUT_DIR (default: <OUT_DIR>/qc)
-f FILENAME, --filename FILENAME
file name for output QC report (default:
<OUT_DIR>/<QC_DIR_NAME>_report.html)
-u, --update force QC pipeline to run even if output QC directory
already exists in <OUT_DIR> (default: stop if output
QC directory already exists)
Metadata:
--organism ORGANISM explicitly specify organism (e.g. 'human', 'mouse').
Multiple organisms should be separated by commas (e.g.
'human,mouse'). HINT use the --info option to list the
defined organisms
--library-type LIBRARY
explicitly specify library type (e.g. 'RNA-seq',
'ChIP-seq')
--single-cell-platform PLATFORM
explicitly specify the single cell platform (e.g.
'10xGenomics Chromium 3'v3')
QC options:
-p PROTOCOL, --protocol PROTOCOL
explicitly specify the QC protocol to use; can be one
of 'standardSE', 'standardPE', '10x_scRNAseq',
'10x_snRNAseq', '10x_scATAC', '10x_Multiome_GEX',
'10x_Multiome_ATAC', '10x_CellPlex', '10x_Flex',
'10x_ImmuneProfiling', '10x_Visium',
'10x_Visium_FFPE', '10x_Visium_FFPE_PEX',
'ParseEvercode', 'singlecell', 'ICELL8_scATAC'. If not
set then protocol will be determined automatically
based on directory contents and metadata.
--fastq_subset SUBSET
specify size of subset of reads to use for
FastQScreen, strandedness, coverage etc option);
(default 100000, set to 0 to use all reads)
-t NTHREADS, --threads NTHREADS
number of threads to use for QC script (default: taken
from job runner)
Reference data:
--star-index INDEX specify the path to the STAR genome index to use when
mapping reads for metrics such as strandedness etc
(overrides the organism-specific indexes defined in
the config file)
--gtf GTF specify the path to the GTF annotation file to use for
metrics such as 'qualimap rnaseq' (overrides the
organism-specific GTF files defined in the config
file)
Cellranger/10xGenomics options:
--cellranger CELLRANGER_EXE
explicitly specify path to Cellranger executable to
use for single library analysis
--cellranger-reference REFERENCE
specify the path to the reference dataset to use when
running single libary analysis (overrides the
organism-specific references defined in the config
file)
--10x_chemistry {ARC-v1,SC3Pv1,SC3Pv2,SC3Pv3,SC5P-PE,SC5P-R2,auto,fiveprime,threeprime}
assay configuration for 10xGenomics scRNA-seq; if set
to 'auto' (the default) then cellranger will attempt
to determine this automatically
--10x_force_cells N_CELLS
force number of cells for 10xGenomics scRNA-seq and
scATAC-seq, overriding automatic cell detection
algorithms (default is to use built-in cell detection)
Conda dependency resolution:
--enable-conda {yes,no}
use conda to resolve task dependencies; can be 'yes'
or 'no' (default: no)
--conda-env-dir CONDA_ENV_DIR
specify directory for conda enviroments (default:
temporary directory)
Job control options:
--local run the QC on the local system (overrides any runners
defined in the configuration or on the command line)
-c N, --maxcores N maximum number of cores available for QC jobs when
using --local (default no limit, change in in settings
file)
-m M, --maxmem M maximum total memory jobs can request at once when
using --local (in Gbs; default: unlimited)
-j N, --maxjobs N explicitly specify maximum number of concurrent QC
jobs to run (default 12, change in settings file;
ignored when using --local)
-b NBATCHES, --maxbatches NBATCHES
enable dynamic batching of pipeline jobs with maximum
number of batches set to NBATCHES (default: no
batching)
Advanced options:
-r RUNNER, --runner RUNNER
explicitly specify runner definition for running QC
components. RUNNER must be a valid job runner
specification e.g. 'GEJobRunner(-j y)' (default: use
runners set in configuration)
-s N, --batch_size N batch QC commands with N commands per job (default: no
batching)
--ignore-metadata ignore information from project metadata file even if
one is located (default is to use project metadata)
--split-fastqs-by-lane
run QC on copies of input Fastqs where reads have been
split according to lane (default is to run QC on
original Fastqs)
--use-legacy-screen-names {yes,no}
use 'legacy' naming convention for FastqScreen output
files; can be 'yes' or 'no' (default: no)
--no-multiqc turn off generation of MultiQC report
Debugging options:
--verbose run pipeline in 'verbose' mode
--work-dir WORKING_DIR
specify the working directory for the pipeline
operations
--no-cleanup don't remove the temporary project directory on
completion (by default the temporary directory is
deleted)
Deprecated/redundant options:
--fastq_screen_subset SUBSET
redundant: use the --fastq_subset option instead
--force redundant: HTML report generation will always be
attempted (even when pipeline fails)
--multiqc redundant: MultiQC report is generated by default (use
--no-multiqc to disable)
split_icell8_fastqs.py
usage: split_icell8_fastqs.py [-h] [-w WELL_LIST_FILE]
[-m {barcodes,batch,none}] [-s BATCH_SIZE]
[-b BASENAME] [-o OUT_DIR] [-d] [-q] [-c]
[FASTQ_R1 FASTQ_R2 [FASTQ_R1 FASTQ_R2 ...]]
positional arguments:
FASTQ_R1 FASTQ_R2 FASTQ file pairs
optional arguments:
-h, --help show this help message and exit
-w WELL_LIST_FILE, --well-list WELL_LIST_FILE
iCell8 'well list' file
-m {barcodes,batch,none}, --mode {barcodes,batch,none}
how to split the input FASTQs: 'barcodes' (one FASTQ
pair per barcode), 'batch' (one or more FASTQ pairs
with fixed number of reads not exceeding BATCH_SIZE),
or 'none' (output all reads to a single FASTQ pair)
(default: 'barcodes')
-s BATCH_SIZE, --size BATCH_SIZE
number of reads per batch in 'batch' mode (default:
5000000)
-b BASENAME, --basename BASENAME
basename for output FASTQ files (default: 'icell8')
-o OUT_DIR, --outdir OUT_DIR
directory to write output FASTQ files to (default:
current directory)
-d, --discard-unknown-barcodes
discard reads with barcodes which don't match any of
those in the WELL_LIST_FILE (default: keep all reads)
-q, --quality-filter filter reads by barcode and UMI quality (default:
don't filter reads on quality)
-c, --compress output compressed .gz FASTQ files
transfer_data.py
usage: transfer_data.py [-h] [--version] [--subdir {random_bin,run_id}]
[--zip_fastqs] [--max_zip_size MAX_ZIP_SIZE]
[--no_fastqs] [--readme README_TEMPLATE]
[--weburl WEBURL] [--include_downloader]
[--include_qc_report] [--include_10x_outputs] [--link]
[--filter FILTER_PATTERN] [--runner RUNNER]
DEST PROJECT
Transfer copies of Fastq data from an analysis project to an arbitrary
destination for sharing with other people
positional arguments:
DEST destination to copy Fastqs to; can be the name of a
destination defined in the configuration file, or an
arbitrary location of the form '[[USER@]HOST:]DIR' (no
destinations currently defined)
PROJECT path to project directory (or to a Fastqs subdirectory
in a project) to copy Fastqs from
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--subdir {random_bin,run_id}
subdirectory naming scheme: 'random_bin' locates a
random pre-existing empty subdirectory under the
target directory; 'run_id' creates a new subdirectory
'PLATFORM_DATESTAMP.RUN_ID-PROJECT'. If this option is
not set then no subdirectory will be used
--zip_fastqs put Fastqs into a ZIP file
--max_zip_size MAX_ZIP_SIZE
when using '--zip_fastqs' option, defines the maximum
size for the output zip file; multiple zip files will
be created if the data exceeds this limit (default is
create a single zip file with no size limit)
--no_fastqs don't copy Fastqs (other artefacts will be copied, if
specified)
--readme README_TEMPLATE
template file to generate README file from; can be
full path to a template file, or the name of a file in
the 'templates' directory
--weburl WEBURL base URL for webserver (sets the value of the WEBURL
variable in the template README)
--include_downloader copy the 'download_fastqs.py' utility to the final
location
--include_qc_report copy the zipped QC reports to the final location
--include_10x_outputs
copy outputs from 10xGenomics pipelines (e.g.
'cellranger count') to the final location
--link hard link files instead of copying
--filter FILTER_PATTERN
filter Fastq file names based on PATTERN
--runner RUNNER specify the job runner to use for executing the
checksumming, Fastq copy and tar gzipping operations
(defaults to job runner defined for copying in config
file [SimpleJobRunner(join_logs=True)])
update_project_metadata.py
usage: update_project_metadata.py [-h] [-i] [-u UPDATE] DIR PROJECT
positional arguments:
DIR analysis directory to update metadata for
PROJECT project within the analysis directory to update
metadata for
optional arguments:
-h, --help show this help message and exit
-i, --init initialise metadata file for the selected project (nb
can only be applied to one project at a time)
-u UPDATE, --update UPDATE
update the metadata in the selected project by
specifying key=value pairs e.g. user='Peter Briggs'
(nb can only be applied to one project at a time)