Utilities

Note

This documentation has been auto-generated from the command help

In addition to the main auto_process.py command, a number of utilities are available:

analyse_barcodes.py

usage:
    analyse_barcodes.py FASTQ [FASTQ...]
    analyse_barcodes.py DIR
    analyse_barcodes.py -c COUNTS_FILE [COUNTS_FILE...]

Collate and report counts and statistics for Fastq index sequences (aka
barcodes). If multiple Fastq files are supplied then sequences will be pooled
before being analysed. If a single directory is supplied then this will be
assumed to be an output directory from bcl2fastq and files will be processed
on a per-lane basis. If the -c option is supplied then the input must be one
or more file of barcode counts generated previously using the -o option.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

Input and output options:
  -c, --counts          input is one or more counts files generated by
                        previous runs using the '-o/--output' option
  -o COUNTS_FILE_OUT, --output COUNTS_FILE_OUT
                        output all counts to tab-delimited file
                        COUNTS_FILE_OUT. This can be used again in another run
                        by specifying the '-c' option

Reporting options:
  -l LANES, --lanes LANES
                        restrict analysis to the specified lane numbers
                        (default is to process all lanes). Multiple lanes can
                        be specified using ranges (e.g. '2-3'), comma-
                        separated list ('5,7') or a mixture ('2-3,5,7')
  -m MISMATCHES, --mismatches MISMATCHES
                        maximum number of mismatches to use when grouping
                        similar barcodes (will be determined automatically if
                        samplesheet is supplied, otherwise defaults to 0)
  --cutoff CUTOFF       exclude barcodes/barcode groups from reporting with a
                        smaller fraction of associated reads than CUTOFF, e.g.
                        '0.01' excludes barcodes with < 1.0% of reads
                        (default: 0.001)
  -s SAMPLE_SHEET, --sample-sheet SAMPLE_SHEET
                        report best matches against barcodes in SAMPLE_SHEET
  -r REPORT_FILE, --report REPORT_FILE
                        write report to REPORT_FILE (otherwise write to
                        stdout)
  -x XLS_FILE, --xls XLS_FILE
                        write XLS version of report to XLS_FILE
  -f HTML_FILE, --html HTML_FILE
                        write HTML version of report to HTML_FILE
  -t TITLE, --title TITLE
                        title for HTML report (default: 'Barcodes Report')
  -n, --no-report       suppress reporting (overrides --report)

Advanced options:
  --minimum_read_fraction FRACTION
                        weed out individual barcodes from initial analysis
                        which have a smaller fraction of reads than FRACTION,
                        e.g. '0.001' removes barcodes with < 0.1% of reads;
                        speeds up analysis at the expense of accuracy as
                        reported counts will be approximate (default: 1.0e-6)

assign_barcodes.py

usage: assign_barcodes.py [-h] [-n N] INPUT.fq OUTPUT.fq

Extract arbitrary sequence fragments from reads in INPUT.fq FASTQ file and
assign these as the index (barcode) sequences in the read headers in
OUTPUT.fq.

positional arguments:
  INPUT.fq    Input FASTQ file
  OUTPUT.fq   Output FASTQ file

optional arguments:
  -h, --help  show this help message and exit
  -n N        remove first N bases from each read and assign these as barcode
              index sequence (default: 5)

audit_projects.py

usage: audit_projects.py [-h] [--pi PI_NAME] [--unassigned] [DIR [DIR ...]]

Summarise the disk usage for runs that have been processed using auto_process.
The supplied DIRs are directories holding one or more top-level analysis
directories corresponding to different runs. The program reports total disk
usage for projects assigned to each PI across all DIRs.

positional arguments:
  DIR           directory to search for analysis directories for auditing

optional arguments:
  -h, --help    show this help message and exit
  --pi PI_NAME  list data for PI(s) matching PI_NAME (can use glob-style
                patterns)
  --unassigned  list data for projects where PI is not assigned

build_index.py

usage: build_index.py [-h] [-v] [-o OUT_DIR] [--ebwt_base NAME]
                      [--bt2_base NAME] [--overhang N] [-V VERSION]
                      [-r RUNNER]
                      ALIGNER FASTA [ANNOTATION]

Generate indexes for aligners

positional arguments:
  ALIGNER               aligner to build index for (one of 'bowtie',
                        'bowtie2', 'star')
  FASTA                 FASTA file with sequence
  ANNOTATION            annotation file (for use with STAR)

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -o OUT_DIR            output directory for indexes

Bowtie-specific options:
  --ebwt_base NAME      specify basename for output .ebwt files (defaults to
                        FASTA file basename)

Bowtie-specific options:
  --bt2_base NAME       specify basename for output .bt2 files (defaults to
                        FASTA file basename)

STAR-specific options:
  --overhang N          set value for STAR --sjdbOverhang option (default:
                        100)

Advanced options:
  -V VERSION, --aligner-version VERSION
                        specify the version of the aligner to target (only
                        works if conda dependency resolution is configured)
  -r RUNNER, --runner RUNNER
                        explicitly specify runner definition for building the
                        index. RUNNER must be a valid job runner specification
                        e.g. 'GEJobRunner(-pe smp.pe 8)' (default: use
                        appropriate runner from configuration)

concat_fastqs.py

usage: concat_fastqs.py [-h] [--version] [-v] FASTQ [FASTQ ...] FASTQ_OUT

Concatenate reads from one or more input Fastq files into a single new file
FASTQ_OUT

positional arguments:
  FASTQ          Input FASTQ to concatenate
  FASTQ_OUT      Output FASTQ with concatenated reads

optional arguments:
  -h, --help     show this help message and exit
  --version      show program's version number and exit
  -v, --verbose  verbose output

barcode_splitter.py

usage:
    barcode_splitter.py [OPTIONS] FASTQ [FASTQ ...]
    barcode_splitter.py [OPTIONS] FASTQ_R1,FASTQ_R2 [FASTQ_R1,FASTQ_R2 ...]
    barcode_splitter.py [OPTIONS] DIR

Split reads from one or more input Fastq files into new Fastqs based on
matching supplied barcodes.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -b INDEX_SEQ, --barcode INDEX_SEQ
                        specify index sequence to filter using
  -m N_MISMATCHES, --mismatches N_MISMATCHES
                        maximum number of differing bases to allow for two
                        index sequences to count as a match. Default is zero
                        (i.e. exact matches only)
  -n BASE_NAME, --name BASE_NAME
                        basename to use for output files
  -o OUT_DIR, --output-dir OUT_DIR
                        specify directory for output split Fastqs
  -u UNALIGNED_DIR, --unaligned UNALIGNED_DIR
                        specify subdirectory with outputs from bcl2fastq
  -l LANE, --lane LANE  specify lane to collect and split Fastqs for

download_fastqs.py

usage: download_fastqs.py [-h] URL [DIR]

Download checksum file and fastqs from URL into current directory (or
directory DIR, if specified), and verify the downloaded files against the
checksum file.

positional arguments:
  URL         URL with checksum file and fastqs
  DIR         directory to put downloaded fastqs into (defaults to current
              directory)

optional arguments:
  -h, --help  show this help message and exit

demultiplex_icell8_atac.py

usage: demultiplex_icell8_atac.py [-h] [-v] [-o OUTDIR] [-b N] [-n N]
                                  [-m {samples,barcodes}] [--unassigned NAME]
                                  [--swap-i1-i2]
                                  [--reverse-complement {i1,i2,both}] [-u]
                                  [--no-demultiplexing]
                                  WELL_LIST FASTQ_R1 FASTQ_R2 FASTQ_I1
                                  FASTQ_I2

Assign reads from ICELL8 ATAC R1/R2/I1/I2 Fastq set to barcodes and samples in
a well list file

positional arguments:
  WELL_LIST             Well list file
  FASTQ_R1              FASTQ R1
  FASTQ_R2              FASTQ R2
  FASTQ_I1              FASTQ I1
  FASTQ_I2              FASTQ I2

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -o OUTDIR, --output-dir OUTDIR
                        path to demultiplexed output
  -b N, --batch_size N  batch size for splitting index read Fastqs (default:
                        50000000)
  -n N, --nprocessors N
                        number of processors to use
  -m {samples,barcodes}, --mode {samples,barcodes}
                        demultiplex reads by sample (default) or by barcode
  --unassigned NAME     basename for output Fastqs with reads which cannot be
                        assigned to any sample or barcode (default:
                        'Unassigned')
  --swap-i1-i2          swap supplied I1 and I2 Fastqs
  --reverse-complement {i1,i2,both}
                        reverse complement one or both of the indices from the
                        well list file
  -u, --update-read-headers
                        update read headers in the output Fastqs to include
                        the matching index sequence (i.e. barcode) from the
                        well list file
  --no-demultiplexing   don't generate demultiplexed Fastqs (only the stats)

fastq_statistics.py

usage: fastq_statistics.py [-h] [-v] [--unaligned UNALIGNED_DIR]
                           [--sample-sheet SAMPLE_SHEET] [-o STATS_FILE]
                           [-p PER_LANE_STATS_FILE]
                           [-s PER_LANE_SAMPLE_STATS_FILE]
                           [-f FULL_STATS_FILE] [-u] [-n N] [--debug]
                           [--force]
                           ILLUMINA_RUN_DIR

Generate statistics for FASTQ files in ILLUMINA_RUN_DIR (top-level directory
of a processed Illumina run)

positional arguments:
  ILLUMINA_RUN_DIR      input Illumina run directory

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  --unaligned UNALIGNED_DIR
                        specify an alternative name for the 'Unaligned'
                        directory containing the fastq.gz files
  --sample-sheet SAMPLE_SHEET
                        specify a sample sheet file to get additional
                        information from
  -o STATS_FILE, --output STATS_FILE
                        name of output file for per-file statistics (default
                        is 'statistics.info')
  -p PER_LANE_STATS_FILE, --per-lane-stats PER_LANE_STATS_FILE
                        name of output file for per-lane statistics (default
                        is 'per_lane_statistics.info')
  -s PER_LANE_SAMPLE_STATS_FILE, --per-lane-sample-stats PER_LANE_SAMPLE_STATS_FILE
                        name of output file for per-lane statistics (default
                        is 'per_lane_sample_stats.info')
  -f FULL_STATS_FILE, --full-stats FULL_STATS_FILE
                        name of output file for full statistics (default is
                        'statistics_full.info')
  -u, --update          update existing full statistics file with stats for
                        additional files
  -n N, --nprocessors N
                        spread work across N processors/cores (default is 1)
  --debug               turn on debugging output

Deprecated/defunct options:
  --force               does nothing: retained for backwards compatibility

icell8_contamination_filter.py

usage: icell8_contamination_filter.py [-h] [-o OUT_DIR] [-m MAMMALIAN_CONF]
                                      [-c CONTAMINANTS_CONF]
                                      [-a {bowtie,bowtie2}] [-n THREADS]
                                      FQ_R1 FQ_R2

positional arguments:
  FQ_R1                 R1 FASTQ file
  FQ_R2                 Matching R2 FASTQ file

optional arguments:
  -h, --help            show this help message and exit
  -o OUT_DIR, --outdir OUT_DIR
                        directory to write output FASTQ files to (default:
                        current directory)
  -m MAMMALIAN_CONF, --mammalian MAMMALIAN_CONF
                        fastq_screen 'conf' file with the mammalian genome
                        indices
  -c CONTAMINANTS_CONF, --contaminant CONTAMINANTS_CONF
                        fastq_screen 'conf' file with the contaminant genome
                        indices
  -a {bowtie,bowtie2}, --aligner {bowtie,bowtie2}
                        aligner to use with fastq_screen (default: don't
                        specify an aligner)
  -n THREADS, --threads THREADS
                        number of threads to run fastq_screen with (default:
                        1)

icell8_report.py

usage: icell8_report.py [-h] [-s STATS_FILE] [-o OUT_FILE] [-n NAME] [DIR]

positional arguments:
  DIR                   directory with ICell8 processing outputs

optional arguments:
  -h, --help            show this help message and exit
  -s STATS_FILE, --stats_file STATS_FILE
                        ICell8 stats file (default:
                        DIR/stats/icell8_stats.tsv)
  -o OUT_FILE, --out_file OUT_FILE
                        Output HTML file (default: 'icell8_processing.html')
  -n NAME, --name NAME  specify a string to append to the zip archive name and
                        prefix

icell8_stats.py

[2024/04/26-16:13:21] ICell8 stats started
usage: icell8_stats.py [-h] [-w WELL_LIST_FILE] [-u] [-f STATS_FILE] [-a]
                       [-s SUFFIX] [-n NPROCESSORS] [-m MAX_BATCH_SIZE]
                       [-T DIR]
                       [FASTQ_R1 FASTQ_R2 [FASTQ_R1 FASTQ_R2 ...]]

positional arguments:
  FASTQ_R1 FASTQ_R2     FASTQ file pairs

optional arguments:
  -h, --help            show this help message and exit
  -w WELL_LIST_FILE, --well-list WELL_LIST_FILE
                        iCell8 'well list' file
  -u, --unassigned      include 'unassigned' reads
  -f STATS_FILE, --stats-file STATS_FILE
                        output statistics file
  -a, --append          append to statistics file
  -s SUFFIX, --suffix SUFFIX
                        suffix to attach to column names
  -n NPROCESSORS, --nprocessors NPROCESSORS
                        number of processors/cores available for statistics
                        generation (default: 1)
  -m MAX_BATCH_SIZE, --max-batch-size MAX_BATCH_SIZE
                        maximum number of reads per batch when dividing Fastqs
                        (multicore only; default: 100000000)
  -T DIR, --temporary-directory DIR
                        use DIR for temporaries, not $TMPDIR or /tmp

manage_fastqs.py

usage:
    manage_fastqs.py DIR
    manage_fastqs.py DIR PROJECT
    manage_fastqs.py DIR PROJECT copy [[user@]host:]DEST
    manage_fastqs.py DIR PROJECT md5
    manage_fastqs.py DIR PROJECT zip

Fastq management utility. If only DIR is supplied then list the projects; if
PROJECT is supplied then list the fastqs; 'copy' command copies fastqs for the
specified PROJECT to DEST on a local or remote server; 'md5' command generates
checksums for the fastqs; 'zip' command creates a zip file with the fastq
files.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  --filter PATTERN      filter file names for reporting and copying based on
                        PATTERN
  --fastq_dir FASTQ_DIR
                        explicitly specify subdirectory of DIR with Fastq
                        files to run the QC on
  --max_zip_size MAX_ZIP_SIZE
                        for 'zip' command, defines the maximum size for the
                        output zip file; multiple zip files will be created if
                        the data exceeds this limit (default is create a
                        single zip file with no size limit)
  --link                hard link files instead of copying

process_icell8.py

usage: process_icell8.py [-h] [-u UNALIGNED_DIR] [-p NAME] [-o OUTDIR]
                         [-m MAMMALIAN_CONF] [-c CONTAMINANTS_CONF] [-q]
                         [-a {bowtie,bowtie2}] [-r STAGE=RUNNER] [-n STAGE=N]
                         [-s BATCH_SIZE] [-j MAX_JOBS]
                         [--no-contaminant-filter] [--no-cleanup] [--force]
                         [-v] [--no-quality-filter] [--threads THREADS]
                         WELL_LIST [FASTQ_R1 FASTQ_R2 [FASTQ_R1 FASTQ_R2 ...]]

Perform initial QC on FASTQs from Wafergen ICell8: assign to barcodes, filter
on barcode & UMI quality, trim reads, perform contaminant filtering and split
by barcode.

positional arguments:
  WELL_LIST             Well list file
  FASTQ_R1 FASTQ_R2     FASTQ file pairs

optional arguments:
  -h, --help            show this help message and exit
  -u UNALIGNED_DIR, --unaligned UNALIGNED_DIR
                        process FASTQs from 'unaligned' dir with output from
                        bcl2fastq (NB cannot be used with -p option)
  -p NAME, --project NAME
                        process FASTQS from project directory NAME (NB if -o
                        not specified then this will also be used as the
                        output directory; cannot be used with -u option)
  -o OUTDIR, --outdir OUTDIR
                        directory to write outputs to (default: 'CWD/icell8',
                        or project dir if -p is specified)
  -m MAMMALIAN_CONF, --mammalian MAMMALIAN_CONF
                        fastq_screen 'conf' file with the 'mammalian' genome
                        indices (default: None)
  -c CONTAMINANTS_CONF, --contaminants CONTAMINANTS_CONF
                        fastq_screen 'conf' file with the 'contaminant' genome
                        indices (default: None)
  -q, --quality-filter  filter out read pairs with low quality barcode and UMI
                        sequences (not recommended for NextSeq data)
  -a {bowtie,bowtie2}, --aligner {bowtie,bowtie2}
                        aligner to use with fastq_screen (default: don't
                        specify the aligner)
  -r STAGE=RUNNER, --runner STAGE=RUNNER
                        explicitly specify runner definitions for running
                        pipeline jobs at each stage. STAGE can be one of 'defa
                        ult','contaminant_filter','qc','statistics','report'.
                        If STAGE is not specified then it is assumed to be
                        'default'. RUNNER must be a valid job runner
                        specification e.g. 'GEJobRunner(-j y)'. Multiple
                        --runner arguments can be specified (default:
                        'SimpleJobRunner(join_logs=True)')
  -n STAGE=N, --nprocessors STAGE=N
                        specify number of processors to use at each stage.
                        STAGE can be one of 'default','contaminant_filter','qc
                        ','statistics','report'. If STAGE is not specified
                        then it is assumed to be 'default'. Multiple
                        --nprocessors arguments can be specified (default: 1)
  -s BATCH_SIZE, --size BATCH_SIZE
                        number of reads per batch when splitting FASTQ files
                        for processing (default: 5000000)
  -j MAX_JOBS, --max-jobs MAX_JOBS
                        maxiumum number of concurrent jobs to run (default:
                        12)
  --no-contaminant-filter
                        don't perform contaminant filter step (default is to
                        do contaminant filtering)
  --no-cleanup          don't remove intermediate Fastq files (default is to
                        delete intermediate Fastqs once no longer needed)
  --force               force overwrite of existing outputs
  -v, --verbose         produce verbose output for diagnostics
  --no-quality-filter   deprecated: kept for backwards compatibility only as
                        barcode/UMI quality checks are now disabled by default
  --threads THREADS     deprecated (use -n/--nprocessors option instead):
                        number of threads to use with multicore tasks (e.g.
                        'contaminant_filter')

run_qc.py

usage: run_qc.py [-h] [--version] [--info] [-n NAME] [-o OUT_DIR]
                 [--qc_dir QC_DIR] [-f FILENAME] [-u] [--organism ORGANISM]
                 [--library-type LIBRARY] [--single-cell-platform PLATFORM]
                 [-p PROTOCOL] [--fastq_subset SUBSET] [-t NTHREADS]
                 [--star-index INDEX] [--gtf GTF]
                 [--cellranger CELLRANGER_EXE]
                 [--cellranger-reference REFERENCE]
                 [--10x_chemistry {ARC-v1,SC3Pv1,SC3Pv2,SC3Pv3,SC5P-PE,SC5P-R2,auto,fiveprime,threeprime}]
                 [--10x_force_cells N_CELLS] [--enable-conda {yes,no}]
                 [--conda-env-dir CONDA_ENV_DIR] [--local] [-c N] [-m M]
                 [-j N] [-b NBATCHES] [-r RUNNER] [-s N] [--ignore-metadata]
                 [--split-fastqs-by-lane] [--use-legacy-screen-names {yes,no}]
                 [--no-multiqc] [--verbose] [--work-dir WORKING_DIR]
                 [--no-cleanup] [--fastq_screen_subset SUBSET] [--force]
                 [--multiqc]
                 DIR | FASTQ [FASTQ ...] [DIR | FASTQ [FASTQ ...] ...]

Run the QC pipeline standalone on an arbitrary set of Fastq files.

positional arguments:
  DIR | FASTQ [ FASTQ ... ]
                        directory or list of Fastq files to run the QC on

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --info                display information on protocols, organisms and other
                        settings (then exit)

Output and reporting:
  -n NAME, --name NAME  name for the project (used in report title)
  -o OUT_DIR, --out_dir OUT_DIR
                        top-level directory for reports and QC output
                        subdirectory (default: current working directory)
  --qc_dir QC_DIR       explicitly specify QC output directory. NB if a
                        relative path is supplied then it's assumed to be a
                        subdirectory of OUT_DIR (default: <OUT_DIR>/qc)
  -f FILENAME, --filename FILENAME
                        file name for output QC report (default:
                        <OUT_DIR>/<QC_DIR_NAME>_report.html)
  -u, --update          force QC pipeline to run even if output QC directory
                        already exists in <OUT_DIR> (default: stop if output
                        QC directory already exists)

Metadata:
  --organism ORGANISM   explicitly specify organism (e.g. 'human', 'mouse').
                        Multiple organisms should be separated by commas (e.g.
                        'human,mouse'). HINT use the --info option to list the
                        defined organisms
  --library-type LIBRARY
                        explicitly specify library type (e.g. 'RNA-seq',
                        'ChIP-seq')
  --single-cell-platform PLATFORM
                        explicitly specify the single cell platform (e.g.
                        '10xGenomics Chromium 3'v3')

QC options:
  -p PROTOCOL, --protocol PROTOCOL
                        explicitly specify the QC protocol to use; can be one
                        of 'standardSE', 'standardPE', '10x_scRNAseq',
                        '10x_snRNAseq', '10x_scATAC', '10x_Multiome_GEX',
                        '10x_Multiome_ATAC', '10x_CellPlex', '10x_Flex',
                        '10x_ImmuneProfiling', '10x_Visium',
                        '10x_Visium_FFPE', '10x_Visium_FFPE_PEX',
                        'ParseEvercode', 'singlecell', 'ICELL8_scATAC'. If not
                        set then protocol will be determined automatically
                        based on directory contents and metadata.
  --fastq_subset SUBSET
                        specify size of subset of reads to use for
                        FastQScreen, strandedness, coverage etc option);
                        (default 100000, set to 0 to use all reads)
  -t NTHREADS, --threads NTHREADS
                        number of threads to use for QC script (default: taken
                        from job runner)

Reference data:
  --star-index INDEX    specify the path to the STAR genome index to use when
                        mapping reads for metrics such as strandedness etc
                        (overrides the organism-specific indexes defined in
                        the config file)
  --gtf GTF             specify the path to the GTF annotation file to use for
                        metrics such as 'qualimap rnaseq' (overrides the
                        organism-specific GTF files defined in the config
                        file)

Cellranger/10xGenomics options:
  --cellranger CELLRANGER_EXE
                        explicitly specify path to Cellranger executable to
                        use for single library analysis
  --cellranger-reference REFERENCE
                        specify the path to the reference dataset to use when
                        running single libary analysis (overrides the
                        organism-specific references defined in the config
                        file)
  --10x_chemistry {ARC-v1,SC3Pv1,SC3Pv2,SC3Pv3,SC5P-PE,SC5P-R2,auto,fiveprime,threeprime}
                        assay configuration for 10xGenomics scRNA-seq; if set
                        to 'auto' (the default) then cellranger will attempt
                        to determine this automatically
  --10x_force_cells N_CELLS
                        force number of cells for 10xGenomics scRNA-seq and
                        scATAC-seq, overriding automatic cell detection
                        algorithms (default is to use built-in cell detection)

Conda dependency resolution:
  --enable-conda {yes,no}
                        use conda to resolve task dependencies; can be 'yes'
                        or 'no' (default: no)
  --conda-env-dir CONDA_ENV_DIR
                        specify directory for conda enviroments (default:
                        temporary directory)

Job control options:
  --local               run the QC on the local system (overrides any runners
                        defined in the configuration or on the command line)
  -c N, --maxcores N    maximum number of cores available for QC jobs when
                        using --local (default no limit, change in in settings
                        file)
  -m M, --maxmem M      maximum total memory jobs can request at once when
                        using --local (in Gbs; default: unlimited)
  -j N, --maxjobs N     explicitly specify maximum number of concurrent QC
                        jobs to run (default 12, change in settings file;
                        ignored when using --local)
  -b NBATCHES, --maxbatches NBATCHES
                        enable dynamic batching of pipeline jobs with maximum
                        number of batches set to NBATCHES (default: no
                        batching)

Advanced options:
  -r RUNNER, --runner RUNNER
                        explicitly specify runner definition for running QC
                        components. RUNNER must be a valid job runner
                        specification e.g. 'GEJobRunner(-j y)' (default: use
                        runners set in configuration)
  -s N, --batch_size N  batch QC commands with N commands per job (default: no
                        batching)
  --ignore-metadata     ignore information from project metadata file even if
                        one is located (default is to use project metadata)
  --split-fastqs-by-lane
                        run QC on copies of input Fastqs where reads have been
                        split according to lane (default is to run QC on
                        original Fastqs)
  --use-legacy-screen-names {yes,no}
                        use 'legacy' naming convention for FastqScreen output
                        files; can be 'yes' or 'no' (default: no)
  --no-multiqc          turn off generation of MultiQC report

Debugging options:
  --verbose             run pipeline in 'verbose' mode
  --work-dir WORKING_DIR
                        specify the working directory for the pipeline
                        operations
  --no-cleanup          don't remove the temporary project directory on
                        completion (by default the temporary directory is
                        deleted)

Deprecated/redundant options:
  --fastq_screen_subset SUBSET
                        redundant: use the --fastq_subset option instead
  --force               redundant: HTML report generation will always be
                        attempted (even when pipeline fails)
  --multiqc             redundant: MultiQC report is generated by default (use
                        --no-multiqc to disable)

split_icell8_fastqs.py

usage: split_icell8_fastqs.py [-h] [-w WELL_LIST_FILE]
                              [-m {barcodes,batch,none}] [-s BATCH_SIZE]
                              [-b BASENAME] [-o OUT_DIR] [-d] [-q] [-c]
                              [FASTQ_R1 FASTQ_R2 [FASTQ_R1 FASTQ_R2 ...]]

positional arguments:
  FASTQ_R1 FASTQ_R2     FASTQ file pairs

optional arguments:
  -h, --help            show this help message and exit
  -w WELL_LIST_FILE, --well-list WELL_LIST_FILE
                        iCell8 'well list' file
  -m {barcodes,batch,none}, --mode {barcodes,batch,none}
                        how to split the input FASTQs: 'barcodes' (one FASTQ
                        pair per barcode), 'batch' (one or more FASTQ pairs
                        with fixed number of reads not exceeding BATCH_SIZE),
                        or 'none' (output all reads to a single FASTQ pair)
                        (default: 'barcodes')
  -s BATCH_SIZE, --size BATCH_SIZE
                        number of reads per batch in 'batch' mode (default:
                        5000000)
  -b BASENAME, --basename BASENAME
                        basename for output FASTQ files (default: 'icell8')
  -o OUT_DIR, --outdir OUT_DIR
                        directory to write output FASTQ files to (default:
                        current directory)
  -d, --discard-unknown-barcodes
                        discard reads with barcodes which don't match any of
                        those in the WELL_LIST_FILE (default: keep all reads)
  -q, --quality-filter  filter reads by barcode and UMI quality (default:
                        don't filter reads on quality)
  -c, --compress        output compressed .gz FASTQ files

transfer_data.py

usage: transfer_data.py [-h] [--version] [--subdir {random_bin,run_id}]
                        [--zip_fastqs] [--max_zip_size MAX_ZIP_SIZE]
                        [--no_fastqs] [--readme README_TEMPLATE]
                        [--weburl WEBURL] [--include_downloader]
                        [--include_qc_report] [--include_10x_outputs] [--link]
                        [--filter FILTER_PATTERN] [--runner RUNNER]
                        DEST PROJECT

Transfer copies of Fastq data from an analysis project to an arbitrary
destination for sharing with other people

positional arguments:
  DEST                  destination to copy Fastqs to; can be the name of a
                        destination defined in the configuration file, or an
                        arbitrary location of the form '[[USER@]HOST:]DIR' (no
                        destinations currently defined)
  PROJECT               path to project directory (or to a Fastqs subdirectory
                        in a project) to copy Fastqs from

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --subdir {random_bin,run_id}
                        subdirectory naming scheme: 'random_bin' locates a
                        random pre-existing empty subdirectory under the
                        target directory; 'run_id' creates a new subdirectory
                        'PLATFORM_DATESTAMP.RUN_ID-PROJECT'. If this option is
                        not set then no subdirectory will be used
  --zip_fastqs          put Fastqs into a ZIP file
  --max_zip_size MAX_ZIP_SIZE
                        when using '--zip_fastqs' option, defines the maximum
                        size for the output zip file; multiple zip files will
                        be created if the data exceeds this limit (default is
                        create a single zip file with no size limit)
  --no_fastqs           don't copy Fastqs (other artefacts will be copied, if
                        specified)
  --readme README_TEMPLATE
                        template file to generate README file from; can be
                        full path to a template file, or the name of a file in
                        the 'templates' directory
  --weburl WEBURL       base URL for webserver (sets the value of the WEBURL
                        variable in the template README)
  --include_downloader  copy the 'download_fastqs.py' utility to the final
                        location
  --include_qc_report   copy the zipped QC reports to the final location
  --include_10x_outputs
                        copy outputs from 10xGenomics pipelines (e.g.
                        'cellranger count') to the final location
  --link                hard link files instead of copying
  --filter FILTER_PATTERN
                        filter Fastq file names based on PATTERN
  --runner RUNNER       specify the job runner to use for executing the
                        checksumming, Fastq copy and tar gzipping operations
                        (defaults to job runner defined for copying in config
                        file [SimpleJobRunner(join_logs=True)])

update_project_metadata.py

usage: update_project_metadata.py [-h] [-i] [-u UPDATE] DIR PROJECT

positional arguments:
  DIR                   analysis directory to update metadata for
  PROJECT               project within the analysis directory to update
                        metadata for

optional arguments:
  -h, --help            show this help message and exit
  -i, --init            initialise metadata file for the selected project (nb
                        can only be applied to one project at a time)
  -u UPDATE, --update UPDATE
                        update the metadata in the selected project by
                        specifying key=value pairs e.g. user='Peter Briggs'
                        (nb can only be applied to one project at a time)