auto_process commands

Note

This documentation has been auto-generated from the command help

auto_process.py implements the following commands:

info

usage: auto_process.py info [-h] [--version] [--debug] [ANALYSIS_DIR]

Print information about the analysis associated with ANALYSIS_DIR.

positional arguments:
  ANALYSIS_DIR  auto_process analysis directory (optional: defaults to the
                current directory)

optional arguments:
  -h, --help    show this help message and exit
  --version     show program's version number and exit
  --debug       Turn on debugging output

setup

usage: auto_process.py setup [-h] [--version] -r RUN_NUMBER [-s SAMPLE_SHEET]
                             [-n ANALYSIS_NUMBER] [-f FILE]
                             [--fastq-dir UNALIGNED_DIR]
                             [--analysis-dir ANALYSIS_DIR] [--debug]
                             RUN_DIR

Set up automatic processing of Illumina sequencing data from RUN_DIR.

positional arguments:
  RUN_DIR               directory with the output from an Illumina sequencer

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -r RUN_NUMBER, --run-number RUN_NUMBER
                        Set facility run number (required)
  -s SAMPLE_SHEET, --samplesheet SAMPLE_SHEET, --sample-sheet SAMPLE_SHEET
                        Copy sample sheet file from name and location
                        SAMPLE_SHEET (default is to look for SampleSheet.csv
                        inside DIR). SAMPLE_SHEET can be a local or remote
                        file, or a URL
  -n ANALYSIS_NUMBER, --analysis-number ANALYSIS_NUMBER
                        Set analysis number (e.g. if reprocessing a run); will
                        be appended to analysis directory name if '--analysis-
                        dir' not supplied
  -f FILE, --file FILE  Additional file(s) to copy into new analysis directory
                        (e.g. ICELL8 well list). FILE can be a local or remote
                        file, or a URL
  --fastq-dir UNALIGNED_DIR
                        Import fastq.gz files from FASTQ_DIR (which should be
                        a subdirectory of DIR with the same structure as that
                        the 'Unaligned' or 'bcl2fastq2' output directory
                        produced by CASAVA/bcl2fastq)
  --analysis-dir ANALYSIS_DIR
                        Make new directory called ANALYSIS_DIR (otherwise
                        default is '<RUN_DIR>_analysis[<ANALYSIS_NUMBER>]')
  --debug               Turn on debugging output

make_fastqs

usage: auto_process.py make_fastqs [-h] [--version] [--no-save] [--debug]
                                   [--id NAME] [--force-copy]
                                   [--protocol {standard,mirna,icell8,icell8_atac,10x_chromium_sc,10x_atac,10x_visium,10x_multiome,10x_multiome_atac,10x_multiome_gex,parse_evercode}]
                                   [--sample-sheet SAMPLE_SHEET]
                                   [--lanes LANES[:OPTIONS]]
                                   [--output-dir OUT_DIR]
                                   [--platform PLATFORM]
                                   [--use-bases-mask BASES_MASK]
                                   [--bcl-converter CONVERTER]
                                   [--no-lane-splitting]
                                   [--use-lane-splitting]
                                   [--find-adapters-with-sliding-window]
                                   [--create-empty-fastqs]
                                   [--no-create-empty-fastqs]
                                   [--create-fastq-for-index-reads]
                                   [--nprocessors NPROCESSORS]
                                   [--runner RUNNER]
                                   [--adapter ADAPTER_SEQUENCE]
                                   [--adapter-read2 ADAPTER_SEQUENCE_READ2]
                                   [--minimum-trimmed-read-length MINIMUM_TRIMMED_READ_LENGTH]
                                   [--mask-short-adapter-reads MASK_SHORT_ADAPTER_READS]
                                   [--no-adapter-trimming]
                                   [--well-list ICELL8_WELL_LIST]
                                   [--swap-i1-and-i2]
                                   [--reverse-complement {i1,i2,both}]
                                   [--10x_jobmode CELLRANGER_JOBMODE]
                                   [--10x_localcores CELLRANGER_LOCALCORES]
                                   [--10x_localmem CELLRANGER_LOCALMEM]
                                   [--10x_maxjobs CELLRANGER_MAXJOBS]
                                   [--10x_mempercore CELLRANGER_MEMPERCORE]
                                   [--10x_jobinterval CELLRANGER_JOBINTERVAL]
                                   [--ignore-dual-index]
                                   [--rc-i2-override RC_I2_OVERRIDE]
                                   [--stats-file STATS_FILE]
                                   [--per-lane-stats-file PER_LANE_STATS_FILE]
                                   [--no-stats]
                                   [--barcode-analysis-dir BARCODE_ANALYSIS_DIR]
                                   [--no-barcode-analysis] [-j NJOBS]
                                   [-c NCORES] [-b NBATCHES] [--verbose]
                                   [--work-dir WORKING_DIR]
                                   [--require-bcl2fastq-version BCL2FASTQ_VERSION]
                                   [ANALYSIS_DIR]

Generate fastq files from raw bcl files produced by Illumina sequencer.

positional arguments:
  ANALYSIS_DIR          auto_process analysis directory (optional: defaults to
                        the current directory)

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --no-save             Don't save parameter changes to the auto_process.info
                        file
  --debug               Turn on debugging output
  --id NAME             identifier for output files

Primary data management:
  --force-copy          force primary data to be copied (by default only data
                        on a remote system will be copied; data on a local
                        system will be symlinked)

General Fastq generation:
  --protocol {standard,mirna,icell8,icell8_atac,10x_chromium_sc,10x_atac,10x_visium,10x_multiome,10x_multiome_atac,10x_multiome_gex,parse_evercode}
                        specify Fastq generation protocol depending on the
                        data being processed (default: 'standard')
  --sample-sheet SAMPLE_SHEET
                        use an alternative sample sheet to the default
                        'custom_SampleSheet.csv' created on setup.
  --lanes LANES[:OPTIONS]
                        define a set of lanes to group for processing. LANES
                        can be a single lane (e.g. '1'), a list ('1,2,3,7'), a
                        range ('1-3'), or a combination ('1-3,7'). Specified
                        lanes are processed together in a group, using OPTIONS
                        (if supplied). OPTIONS takes the form
                        '[PROTOCOL:][KEY=VALUE:[KEY=VALUE]...] (for example
                        --lanes=1-4:standard:trim_adapters=no)
  --output-dir OUT_DIR  set the directory for the output Fastqs (default:
                        'bcl2fastq')
  --platform PLATFORM   explicitly specify the sequencing platform. Only use
                        this if the platform cannot be identified from the
                        instrument name
  --use-bases-mask BASES_MASK
                        explicitly set the bases-mask string to indicate how
                        each cycle should be used in the BCL to Fastq
                        conversion (overrides default). Set to 'auto' to
                        determine automatically

Bcl conversion options:
  --bcl-converter CONVERTER
                        explicitly set BCL conversion software to use for
                        non-10xGenomics/non-ICELL8 runs (either 'bcl2fastq' or
                        'bcl-convert'; can also include a version specifier
                        e.g. 'bcl2fastq>=2.0'). Default: bcl2fastq>=2.20 (may
                        be overridden by platform-specific settings)
  --no-lane-splitting   don't split the output FASTQ files by lane. Default:
                        off (may be overridden by platform-specific settings);
                        turn off using --use-lane-splitting
  --use-lane-splitting  split the output FASTQ files by lane. Default: on (but
                        may be overridden by platform-specific settings); turn
                        off using --no-lane-splitting
  --find-adapters-with-sliding-window
                        use sliding window algorithm to identify adapters for
                        trimming
  --create-empty-fastqs
                        create empty files as placeholders for missing FASTQs
                        from demultiplexing step. Default: off (but may be
                        overridden by platform-specific settings); turn off
                        using --no-create-empty-fastqs. NB Fastq generation
                        must have finished without for this option to be
                        applied
  --no-create-empty-fastqs
                        don't create empty files as placeholders for missing
                        FASTQs from demultiplexing step. Default: on (but may
                        be overridden by platform-specific settings); turn off
                        using --create-empty-fastqs.
  --create-fastq-for-index-reads
                        also create FASTQs for index reads
  --nprocessors NPROCESSORS
                        explicitly specify number of processors/cores to use
                        (default taken from job runner)
  --runner RUNNER       explicitly specify runner definition (e.g.
                        'GEJobRunner(-j y)')

Adapter trimming and masking:
  --adapter ADAPTER_SEQUENCE
                        sequence of adapter to be trimmed. Specify multiple
                        adapters by separating them with plus sign (+). Only
                        used for read 1 if --adapter-read2 is also specified
                        (default: use adapter sequence from sample sheet)
  --adapter-read2 ADAPTER_SEQUENCE_READ2
                        sequence of adapter to be trimmed in read 2. Specify
                        multiple adapters by separating them with plus sign
                        (+) (default: use adapter sequence from sample sheet)
  --minimum-trimmed-read-length MINIMUM_TRIMMED_READ_LENGTH
                        Minimum read length after adapter trimming. bcl2fastq
                        trims the adapter from the read down to this value; if
                        there is more adapter match below this length then
                        those bases are masked not trimmed (i.e. replaced by N
                        rather than removed) (default: 35)
  --mask-short-adapter-reads MASK_SHORT_ADAPTER_READS
                        minimum length of unmasked bases that a read can be
                        after adapter trimming; reads with fewer ACGT bases
                        will be completely masked with Ns (default: 22)
  --no-adapter-trimming
                        turn off adapter trimming even if adapter sequences
                        are supplied

ICELL8 options (ICELL8 data only):
  --well-list ICELL8_WELL_LIST
                        specify ICELL8 well list file
  --swap-i1-and-i2      swap supplied I1 and I2 Fastqs when matching ATAC
                        barcodes against well list
  --reverse-complement {i1,i2,both}
                        can be 'i1','i2', or 'both'; reverse complement the
                        specified indices from the well list when matching
                        ATAC barcodes against well list

10x Genomics data options (Cellranger*/Spaceranger):
  --10x_jobmode CELLRANGER_JOBMODE
                        job mode to run cellranger in (default: 'local')
  --10x_localcores CELLRANGER_LOCALCORES
                        maximum cores cellranger can request at onetime for
                        jobmode 'local' (ignored for other jobmodes) (default:
                        1)
  --10x_localmem CELLRANGER_LOCALMEM
                        maximum total memory cellranger can request at one
                        time for jobmode 'local' (ignored for other jobmodes)
                        (in Gbs; default: 5)
  --10x_maxjobs CELLRANGER_MAXJOBS
                        maxiumum number of concurrent jobs to run NB only used
                        if jobmode is not 'local' (default: 24)
  --10x_mempercore CELLRANGER_MEMPERCORE
                        memory assumed per core (in Gbs; default: 5); NB only
                        used if jobmode is not 'local'
  --10x_jobinterval CELLRANGER_JOBINTERVAL
                        how often jobs are submitted (in ms; default: 100);
                        only used if jobmode is not 'local'
  --ignore-dual-index   on a dual-indexed flowcell where the second index was
                        not used for the 10x sample, ignore it

10x Genomics Spaceranger options:
  --rc-i2-override RC_I2_OVERRIDE
                        (Spaceranger only) explicitly indicate whether bases
                        in I2 read were emitted as reverse complement by the
                        sequencing workflow: set to 'true' for the Reverse
                        Complement Workflow (Workflow B)/ NovaSeq Reagent Kit
                        v1.5 or greater, 'false' for the Forward Strand
                        Workflow (Workflow A) / older NovaSeq Reagent Kits. If
                        unset then workflow will be determined automatically
                        (recommended)

Statistics generation:
  --stats-file STATS_FILE
                        specify output file for fastq statistics
  --per-lane-stats-file PER_LANE_STATS_FILE
                        specify output file for per-lane statistics
  --no-stats            don't generate statistics file; use
                        'update_fastq_stats' command to (re)generate
                        statistics

Barcode analysis:
  --barcode-analysis-dir BARCODE_ANALYSIS_DIR
                        specify subdirectory where barcode analysis will be
                        performed and outputs will be written
  --no-barcode-analysis
                        don't perform barcode analysis; use 'analyse_barcodes'
                        command to run barcode analysis separately

Job control options:
  -j NJOBS, --maxjobs NJOBS
                        maxiumum number of jobs to run concurrently (default:
                        12)
  -c NCORES, --maxcores NCORES
                        maximum number of cores available for running jobs
                        (default: no limit)
  -b NBATCHES, --maxbatches NBATCHES
                        enable dynamic batching of pipeline jobs with maximum
                        number of batches set to NBATCHES (default: no
                        batching)

Advanced/debugging options:
  --verbose             run pipeline in 'verbose' mode
  --work-dir WORKING_DIR
                        specify the working directory for the pipeline
                        operations

Deprecated options:
  --require-bcl2fastq-version BCL2FASTQ_VERSION
                        deprecated: explicitly specify version of bcl2fastq
                        software to use (e.g. '=1.8.4' or '>=2.0') (use --bcl-
                        converter instead)

analyse_barcodes

usage: auto_process.py analyse_barcodes [-h] [--version]
                                        [--unaligned-dir UNALIGNED_DIR]
                                        [--lanes LANES]
                                        [--mismatches MISMATCHES]
                                        [--cutoff CUTOFF]
                                        [--sample-sheet SAMPLE_SHEET]
                                        [--id NAME]
                                        [--barcode-analysis-dir BARCODE_ANALYSIS_DIR]
                                        [--force] [--runner RUNNER] [--debug]
                                        [ANALYSIS_DIR]

Analyse barcode sequences for Fastq files in specified lanes in ANALYSIS_DIR,
and report the most common barcodes found across all reads from each lane.

positional arguments:
  ANALYSIS_DIR          auto_process analysis directory (optional: defaults to
                        the current directory)

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --unaligned-dir UNALIGNED_DIR
                        explicitly set the (sub)directory with bcl-to-fastq
                        outputs
  --lanes LANES         specify which lanes to analyse barcodes for (default
                        is to do analysis for all lanes).
  --mismatches MISMATCHES
                        maximum number of mismatches to use when grouping
                        similar barcodes (default is to determine
                        automatically from the bases mask)
  --cutoff CUTOFF       exclude barcodes with a smaller fraction of associated
                        reads than CUTOFF, e.g. '0.01' excludes barcodes with
                        < 1% of reads (default is 0.01%)
  --sample-sheet SAMPLE_SHEET
                        use an alternative sample sheet to the default
                        'custom_SampleSheet.csv' created on setup.
  --id NAME             specify an identifier to be written into the default
                        output barcode analysis directory name (e.g.
                        'barcode_analysis_NAME') and report title
  --barcode-analysis-dir BARCODE_ANALYSIS_DIR
                        specify subdirectory where barcode analysis will be
                        performed and outputs will be written
  --force               discard and regenerate counts (by default existing
                        counts will be used)
  --runner RUNNER       explicitly specify runner definition (e.g.
                        'GEJobRunner(-j y)')
  --debug               Turn on debugging output

setup_analysis_dirs

usage: auto_process.py setup_analysis_dirs [-h] [--version]
                                           [--ignore-missing-metadata]
                                           [--unaligned-dir UNALIGNED_DIR]
                                           [--undetermined UNDETERMINED]
                                           [--short-fastq-names]
                                           [--link-to-fastqs] [--id NAME]
                                           [--debug]
                                           [ANALYSIS_DIR]

Create analysis subdirectories for projects defined in projects.info file in
ANALYSIS_DIR.

positional arguments:
  ANALYSIS_DIR          auto_process analysis directory (optional: defaults to
                        the current directory)

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --ignore-missing-metadata
                        force creation of project directories even if metadata
                        is not set (default is to fail if metadata is missing)
  --unaligned-dir UNALIGNED_DIR
                        explicitly specify the subdirectory with output Fastqs
  --undetermined UNDETERMINED
                        explicitly specify name for project directory with
                        'undetermined' fastqs
  --short-fastq-names   shorten fastq file names when copying or linking from
                        project directory (default is to keep long names from
                        bcl2fastq)
  --link-to-fastqs      create symbolic links to original fastqs from project
                        directory (default is to make hard links)
  --id NAME             identifier to append to project names
  --debug               Turn on debugging output

run_qc

usage: auto_process.py run_qc [-h] [--version] [--projects PROJECT_PATTERN]
                              [--qc_dir QC_DIR] [--fastq_dir FASTQ_DIR]
                              [--fastq_subset SUBSET] [-t NTHREADS]
                              [--cellranger CELLRANGER_EXE]
                              [--10x_chemistry {ARC-v1,SC3Pv1,SC3Pv2,SC3Pv3,SC5P-PE,SC5P-R2,auto,fiveprime,threeprime}]
                              [--10x_force_cells N_CELLS]
                              [--10x_extra_projects PROJECT_DIRS]
                              [--10x_transcriptome ORGANISM=REFERENCE]
                              [--10x_premrna_reference ORGANISM=REFERENCE]
                              [--report HTML_FILE] [--enable-conda {yes,no}]
                              [--conda-env-dir CONDA_ENV_DIR] [-c NCORES]
                              [-j NJOBS] [-b NBATCHES] [--verbose]
                              [--work-dir WORKING_DIR] [--runner RUNNER]
                              [--debug]
                              [ANALYSIS_DIR]

Run QC procedures for sequencing projects in ANALYSIS_DIR.

positional arguments:
  ANALYSIS_DIR          auto_process analysis directory (optional: defaults to
                        the current directory)

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --projects PROJECT_PATTERN
                        simple wildcard-based pattern specifying a subset of
                        projects and samples to run the QC on. PROJECT_PATTERN
                        should be of the form 'pname[/sname]', where 'pname'
                        specifies a project (or set of projects) and 'sname'
                        optionally specifies a sample (or set of samples).
  --qc_dir QC_DIR       explicitly specify QC output directory (nb if supplied
                        then the same QC_DIR will be used for each project.
                        Non-absolute paths are assumed to be relative to the
                        project directory). Default: 'qc'
  --fastq_dir FASTQ_DIR
                        explicitly specify subdirectory of DIR with Fastq
                        files to run the QC on.

QC options:
  --fastq_subset SUBSET
                        specify size of subset of total reads to use for
                        fastq_screen, BAM file generation etc (default 100000,
                        set to 0 to use all reads)
  -t NTHREADS, --threads NTHREADS
                        number of threads to use for QC script (default: taken
                        from job runner)

Cellranger/10xGenomics options:
  --cellranger CELLRANGER_EXE
                        explicitly specify path to Cellranger executable to
                        use for single library analysis (NB will be used for
                        all projects)
  --10x_chemistry {ARC-v1,SC3Pv1,SC3Pv2,SC3Pv3,SC5P-PE,SC5P-R2,auto,fiveprime,threeprime}
                        assay configuration for 10xGenomics scRNA-seq; if set
                        to 'auto' (the default) then cellranger will attempt
                        to determine this automatically
  --10x_force_cells N_CELLS
                        force number of cells for 10xGenomics scRNA-seq and
                        scATAC-seq, overriding automatic cell detection
                        algorithms (default is to use built-in cell detection)
  --10x_extra_projects PROJECT_DIRS
                        specify additional projects to include samples from in
                        single library analyses, as comma-separated list
  --10x_transcriptome ORGANISM=REFERENCE
                        specify cellranger transcriptome reference datasets to
                        associate with organisms (overrides references defined
                        in config file)
  --10x_premrna_reference ORGANISM=REFERENCE
                        specify cellranger pre-mRNA reference datasets to
                        associate with organisms (overrides references defined
                        in config file)

Output and reporting:
  --report HTML_FILE    file name for output HTML QC report (default:
                        <QC_DIR>_report.html)

Conda dependency resolution:
  --enable-conda {yes,no}
                        use conda to resolve task dependencies; can be 'yes'
                        or 'no' (default: no)
  --conda-env-dir CONDA_ENV_DIR
                        specify directory for conda enviroments (default:
                        temporary directory)

Job control options:
  -c NCORES, --maxcores NCORES
                        maximum number of cores available for running jobs
                        (default: no limit)
  -j NJOBS, --maxjobs NJOBS
                        maxiumum number of jobs to run concurrently (default:
                        12)
  -b NBATCHES, --maxbatches NBATCHES
                        enable dynamic batching of pipeline jobs with maximum
                        number of batches set to NBATCHES (default: no
                        batching)

Advanced/debugging options:
  --verbose             run pipeline in 'verbose' mode
  --work-dir WORKING_DIR
                        specify the working directory for the pipeline
                        operations
  --runner RUNNER       explicitly specify runner definition (e.g.
                        'GEJobRunner(-j y)')
  --debug               Turn on debugging output

publish_qc

usage: auto_process.py publish_qc [-h] [--version] [--qc_dir QC_DIR]
                                  [--use-hierarchy {yes,no}] [--url BASE_URL]
                                  [--projects PROJECT_PATTERN]
                                  [--ignore-missing-qc]
                                  [--exclude-zip-files {yes,no}]
                                  [--regenerate-reports] [--force]
                                  [--suppress-warnings] [--legacy]
                                  [--runner RUNNER] [--debug]
                                  [ANALYSIS_DIR]

Copy QC reports from ANALYSIS_DIR to local or remote directory (e.g. web
server). By default existing QC reports will be copied without further
checking; if no report is found then QC results will be verified and a report
generated first.

positional arguments:
  ANALYSIS_DIR          auto_process analysis directory (optional: defaults to
                        the current directory)

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

Destination options:
  --qc_dir QC_DIR       specify target directory to copy QC reports to. QC_DIR
                        can be a local directory, or a remote location in the
                        form '[[user@]host:]directory'. Overrides the default
                        settings.
  --use-hierarchy {yes,no}
                        use YEAR/PLATFORM hierarchy under QC_DIR; can be 'yes'
                        or 'no' (default: no)
  --url BASE_URL        specify the 'base' URL for accessing the published
                        reports. Overrides the default settings

Projects and data options:
  --projects PROJECT_PATTERN
                        simple wildcard-based pattern specifying a subset of
                        projects and samples to publish the QC for.
                        PROJECT_PATTERN can specify a single project, or a set
                        of projects.
  --ignore-missing-qc   skip projects where QC results are missing or can't be
                        verified, or where reports can't be generated.
  --exclude-zip-files {yes,no}
                        exclude ZIP archives from publication; can be 'yes' or
                        'no' (default: no)

QC reporting options:
  --regenerate-reports  attempt to regenerate existing QC reports
  --force               force generation of QC reports for all projects even
                        if verification has failed
  --suppress-warnings   don't include warning messages in (re)generated QC
                        reports or top level index even if there are missing
                        metrics in individual QC reports (NB won't be applied
                        for pre-existing reports; combine with --regenerate-
                        reports and --force to update all reports)
  --legacy              legacy mode: include links to MultiQC, 'cellranger
                        count' and ICELL8 reports in the top-level index page

Advanced/debugging options:
  --runner RUNNER       explicitly specify runner definition (e.g.
                        'GEJobRunner(-j y)')
  --debug               Turn on debugging output

archive

usage: auto_process.py archive [-h] [--version] [--archive_dir ARCHIVE_DIR]
                               [--platform PLATFORM] [--year YEAR]
                               [--group GROUP] [--chmod PERMISSIONS] [--final]
                               [--force] [--runner RUNNER] [--dry-run]
                               [--debug]
                               [ANALYSIS_DIR]

Copy sequencing analysis data directory ANALYSIS_DIR to 'archive' destination.

positional arguments:
  ANALYSIS_DIR          auto_process analysis directory (optional: defaults to
                        the current directory)

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --archive_dir ARCHIVE_DIR
                        specify top-level archive directory to copy data
                        under. ARCHIVE_DIR can be a local directory, or a
                        remote location in the form '[[user@]host:]directory'.
                        Overrides the default settings.
  --platform PLATFORM   specify the platform e.g. 'hiseq', 'miseq' etc
                        (overrides automatically determined platform, if any).
                        Use 'other' for cases where the platform is unknown.
  --year YEAR           specify the year e.g. '2014' (default is the current
                        year)
  --group GROUP         specify the name of group for the archived files
                        (default: None)
  --chmod PERMISSIONS   specify permissions for the archived files.
                        PERMISSIONS should be a string recognised by the
                        'chmod' command (e.g. 'o-rwX') (default: None)
  --final               copy data to final archive location (default is to
                        copy to staging area)
  --force               attempt to complete archiving operations ignoring any
                        errors (e.g. key metadata items not set, unable to set
                        group etc)
  --runner RUNNER       explicitly specify runner definition (e.g.
                        'GEJobRunner(-j y)')
  --dry-run             Dry run i.e. report what would be done but don't
                        perform any actions
  --debug               Turn on debugging output

report

usage: auto_process.py report [-h] [--version]
                              [--logging | --summary | --projects]
                              [--fields FIELDS] [--template TEMPLATE]
                              [--file OUT_FILE] [--debug]
                              [ANALYSIS_DIR]

Report information on analysis in ANALYSIS_DIR.

positional arguments:
  ANALYSIS_DIR         auto_process analysis directory (optional: defaults to
                       the current directory)

optional arguments:
  -h, --help           show this help message and exit
  --version            show program's version number and exit
  --logging            print short report suitable for logging file
  --summary            print full report suitable for bioinformaticians
  --projects           print tab-delimited line (one per project) suitable for
                       injection into a spreadsheet
  --fields FIELDS      fields to report
  --template TEMPLATE  name of template with fields to report (templates
                       should be defined in the config file)
  --file OUT_FILE      write report to OUT_FILE; destination can be a local
                       file, or a remote file specified as [[USER@]HOST:]PATH
                       (default is to write to stdout)
  --debug              Turn on debugging output

samplesheet

usage: auto_process.py samplesheet [-h] [--version]
                                   [--use SAMPLE_SHEET | --set-project [LANES:][COL=PATTERN:]NEW_PROJECT | --set-sample-id [LANES:][COL=PATTERN:]NEW_ID | --set-sample-name NEW_NAME | -i SAMPLE_SHEET | -e | -p]
                                   [--debug]
                                   [ANALYSIS_DIR]

Query and manipulate sample sheets

positional arguments:
  ANALYSIS_DIR          auto_process analysis directory (optional: defaults to
                        the current directory)

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --use SAMPLE_SHEET    update the default sample sheet file to SAMPLE_SHEET
                        (must be a file on the local file system)
  --set-project [LANES:][COL=PATTERN:]NEW_PROJECT
                        update the sample project field. Optional LANES
                        specifies one or more lanes (e.g. '1', '1,2,3', '1-3',
                        '1,3-5') to update; optional COL=PATTERN specifies a
                        glob-style pattern to match to an arbitrary column
                        (e.g. 'Sample_Name=ITS*'); NEW_PROJECT is the new
                        project name
  --set-sample-id [LANES:][COL=PATTERN:]NEW_ID
                        update the sample ID field.Optional LANES specifies
                        one or more lanes (e.g. '1', '1,2,3', '1-3', '1,3-5')
                        to update; optional COL=PATTERN specifies a glob-style
                        pattern to match to an arbitrary column (e.g.
                        'Sample_Name=ITS*'); NEW_ID can be either
                        'SAMPLE_NAME' or an arbitrary string
  --set-sample-name NEW_NAME
                        update the sample name field.Optional LANES specifies
                        one or more lanes (e.g. '1', '1,2,3', '1-3', '1,3-5')
                        to update; optional COL=PATTERN specifies a glob-style
                        pattern to match to an arbitrary column (e.g.
                        'Sample_Name=ITS*'); NEW_NAME can be either
                        'SAMPLE_ID' or an arbitrary string
  -i SAMPLE_SHEET, --import SAMPLE_SHEET
                        replace existing sample sheet file with version copied
                        from the specified location; SAMPLE_SHEET can be a
                        local or remote file, or a URL
  -e, --edit            bring up sample sheet file in an editor to make
                        changes manually
  -p, --predict         show predicted outputs from sample sheet

Advanced options:
  --debug               Turn on debugging output

update

usage: auto_process.py update [-h] [--version] [--debug] [ANALYSIS_DIR]

Update paths and metadata across ANALYSIS_DIR and its projects and QC outputs
when directory has been moved or copied, or project metadata has been updated.

positional arguments:
  ANALYSIS_DIR  existing auto_process analysis directory to update (optional:
                defaults to the current directory)

optional arguments:
  -h, --help    show this help message and exit
  --version     show program's version number and exit
  --debug       Turn on debugging output

merge_fastq_dirs

usage: auto_process.py merge_fastq_dirs [-h] [--version]
                                        [--primary-unaligned-dir UNALIGNED_DIR]
                                        [--output-dir OUTPUT_DIR] [--dry-run]
                                        [--debug]
                                        [ANALYSIS_DIR]

Automatically merge fastq directories from multiple bcl-to-fastq runs within
ANALYSIS_DIR. Use this command if 'make_fastqs' step was run multiple times to
process subsets of lanes.

positional arguments:
  ANALYSIS_DIR          auto_process analysis directory (optional: defaults to
                        the current directory)

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --primary-unaligned-dir UNALIGNED_DIR
                        merge fastqs from additional bcl-to-fastq directories
                        into UNALIGNED_DIR. Original data will be moved out of
                        the way first. Defaults to 'bcl2fastq'.
  --output-dir OUTPUT_DIR
                        merge fastqs into OUTPUT_DIR (relative to
                        ANALYSIS_DIR). Defaults to UNALIGNED_DIR.
  --dry-run             Dry run i.e. report what would be done but don't
                        perform any actions
  --debug               Turn on debugging output

update_fastq_stats

usage: auto_process.py update_fastq_stats [-h] [--version]
                                          [--unaligned-dir UNALIGNED_DIR]
                                          [--sample-sheet SAMPLE_SHEET]
                                          [--id NAME]
                                          [--stats-file STATS_FILE]
                                          [--per-lane-stats-file PER_LANE_STATS_FILE]
                                          [-a] [--force]
                                          [--nprocessors NPROCESSORS]
                                          [--runner RUNNER] [--debug]
                                          [ANALYSIS_DIR]

(Re)generate statistics for fastq files produced from 'make_fastqs'.

positional arguments:
  ANALYSIS_DIR          auto_process analysis directory (optional: defaults to
                        the current directory)

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --unaligned-dir UNALIGNED_DIR
                        explicitly set the (sub)directory with bcl-to-fastq
                        outputs
  --sample-sheet SAMPLE_SHEET
                        explicitly specify the sample sheet to use (defaults
                        to the sample sheet stored in the analysis directory
                        parameters)
  --id NAME             specify an identifier to be written into the output
                        statistics file name (e.g. 'statistics.NAME.info')
  --stats-file STATS_FILE
                        specify output file for fastq statistics
  --per-lane-stats-file PER_LANE_STATS_FILE
                        specify output file for per-lane statistics
  -a, --add             add new data from UNALIGNED_DIR to existing statistics
  --force               force statistics to be regenerated even if existing
                        statistics files are newer than fastqs
  --nprocessors NPROCESSORS
                        explicitly specify number of processors/cores to use
                        (default taken from job runner)
  --runner RUNNER       explicitly specify runner definition (e.g.
                        'GEJobRunner(-j y)')
  --debug               Turn on debugging output

import_project

usage: auto_process.py import_project [-h] [--version] [--debug]
                                      [--comment COMMENT]
                                      [ANALYSIS_DIR] PROJECT_DIR

Copy a project directory PROJECT_DIR from another analysis directory into
ANALYSIS_DIR, update metadata appropriately, and regenerate QC reports.

positional arguments:
  ANALYSIS_DIR       auto_process analysis directory (optional: defaults to
                     the current directory)
  PROJECT_DIR        path to project directory to import

optional arguments:
  -h, --help         show this help message and exit
  --version          show program's version number and exit
  --debug            Turn on debugging output
  --comment COMMENT  specify comment text to be appended to the stored
                     comments associated with the project

config

usage: auto_process.py config [-h] [--version] [--debug]
                              [--init | --set KEY_VALUE | --add NEW_SECTION]
                              [--raw] [--show]

Query and change global configuration. Run without options arguments to
displays configuration settings.

optional arguments:
  -h, --help         show this help message and exit
  --version          show program's version number and exit
  --debug            Turn on debugging output

Creation and edit options:
  --init             Create a new default configuration file based on the
                     sample template.
  --set KEY_VALUE    Set the value of a parameter. KEY_VALUE should be of the
                     form '<param>=<value>' (<param> should be of the form
                     'SECTION[:SUBSECTION].NAME'). Multiple --set options can
                     be specified.
  --add NEW_SECTION  Add a new section called NEW_SECTION to the config (to
                     add e.g. a new platform, use 'platform:NAME'). Multiple
                     --add options can be specified.

Display options:
  --raw              Show the 'raw' configuration (i.e. only parameters and
                     values explicitly defined in the config before defaults
                     are loaded)

Deprecated/defunct options:
  --show             Show the values of parameters and settings (does nothing;
                     use 'config' with no options to display settings)

params

usage: auto_process.py params [-h] [--version] [--set KEY_VALUE] [--debug]
                              [ANALYSIS_DIR]

Query and change processing parameters and settings for ANALYSIS_DIR.

positional arguments:
  ANALYSIS_DIR     auto_process analysis directory (optional: defaults to the
                   current directory)

optional arguments:
  -h, --help       show this help message and exit
  --version        show program's version number and exit
  --set KEY_VALUE  Set the value of a parameter. KEY_VALUE should be of the
                   form '<param>=<value>'. Multiple --set options can be
                   specified.
  --debug          Turn on debugging output

metadata

usage: auto_process.py metadata [-h] [--version] [--set KEY_VALUE] [--update]
                                [--debug]
                                [ANALYSIS_DIR]

Query and change metadata associated with ANALYSIS_DIR.

positional arguments:
  ANALYSIS_DIR     auto_process analysis directory (optional: defaults to the
                   current directory)

optional arguments:
  -h, --help       show this help message and exit
  --version        show program's version number and exit
  --set KEY_VALUE  Set the value of a metadata item. KEY_VALUE should be of
                   the form '<param>=<value>'. Multiple --set options can be
                   specified.
  --update         Automatically update metadata items where possible (e.g.
                   for older analyses which have old or missing metadata
                   files)
  --debug          Turn on debugging output

readme

usage: auto_process.py readme [-h] [--version] [--init] [-V] [-e] [-m MESSAGE]
                              [--debug]
                              [ANALYSIS_DIR]

Add or amend a README file in the analysis directory DIR.

positional arguments:
  ANALYSIS_DIR          auto_process analysis directory (optional: defaults to
                        the current directory)

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --init                create a new README file
  -V, --view            display the contents of the README file
  -e, --edit            bring up README file in an editor to make changes
  -m MESSAGE, --message MESSAGE
                        append MESSAGE text to the README file
  --debug               Turn on debugging output

clone

usage: auto_process.py clone [-h] [--version] [--copy-fastqs]
                             [--exclude-projects] [--debug]
                             [ANALYSIS_DIR] CLONE_DIR

Make a copy of an existing directory DIR in a new directory CLONE_DIR.

positional arguments:
  ANALYSIS_DIR        existing auto_process analysis directory to clone
                      (optional: defaults to the current directory)
  CLONE_DIR           path to cloned directory

optional arguments:
  -h, --help          show this help message and exit
  --version           show program's version number and exit
  --copy-fastqs       Copy fastq.gz files from DIR into CLONE_DIR (default is
                      to make a link to the bcl-to-fastq directory)
  --exclude-projects  Exclude (i.e. don't copy) project directories from DIR
  --debug             Turn on debugging output