auto_process_ngs.cli.auto_process
Automated data processing & QC pipeline for Illumina sequence data
Implements a program for automating stages of a standard protocol for processing and QC’ing Illumina sequencing data.
The stages are:
setup make_fastqs setup_analysis_dirs run_qc publish_qc archive report
The ‘setup’ stage creates an analysis directory and acquires the basic data about the sequencing run from a source directory. Subsequent stages should be run in sequence to create fastq files, set up analysis directories for each project, and run QC scripts for each sample in each project.
The following commands enable the querying and setting of configuration settings and project metadata:
info config params metadata
Additional commands are available:
update clone samplesheet analyse_barcodes merge_fastq_dirs update_fastq_stats import_project readme
but these are not part of the standard workflow - they are used for special cases and testing.
- class auto_process_ngs.cli.auto_process.AutoProcess(analysis_dir=None, settings=None, allow_save_params=True)
Augmented AutoProcess class with commands attached
- analyse_barcodes(unaligned_dir=None, lanes=None, mismatches=None, cutoff=None, barcode_analysis_dir=None, sample_sheet=None, name=None, runner=None, force=False)
Analyse the barcode sequences for Fastqs for each specified lane
Run ‘analyse_barcodes.py’ for one or more lanes, to analyse the barcode index sequences in each lane.
- Parameters:
ap (AutoProcessor) – autoprocessor pointing to the analysis directory to create Fastqs for
unaligned_dir (str) – if set then use this as the output directory for bcl-to-fastq conversion. Default is ‘bcl2fastq’ (unless an alternative is already specified in the config file)
lanes (list) – (optional) specify a list of lane numbers to use in the processing; lanes not in the list will be excluded (default is to include all lanes)
mismatches (int) – (optional) maximum number of mismatches to consider when grouping similar barcodes; default is to determine it automatically
cutoff (float) – (optional) exclude barcodes with a smaller fraction of associated reads than specified cutoff from reporting (e.g. ‘0.001’ excludes barcodes with < 0.1% of reads); default is to include all barcodes
barcode_analysis_dir (str) – (optional) explicitly specify the subdirectory to use for barcode analysis. Counts will be written to and read from the ‘counts’ subdirectory of this directory (defaults to ‘barcode_analysis’)
sample_sheet (str) – if set then use this as the input samplesheet to check barcode sequences against (by default will use the sample sheet defined in the parameter file for the run)
name (str) – (optional) identifier for output directory (if ‘barcode_analysis_dir’ not explicitly set) and report title
runner (JobRunner) – (optional) specify a non-default job runner to use for barcode analysis
force (bool) – if True then forces regeneration of any existing counts (default is to reuse existing counts)
- archive(archive_dir=None, platform=None, year=None, perms=None, group=None, include_bcl2fastq=False, read_only_fastqs=True, runner=None, final=False, logging_file=None, force=False, dry_run=False)
Copy an analysis directory and contents to an archive area
Copies the contents of the analysis directory to an archive area, which can be on a local or remote system.
The archive directory is constructed in the form
<TOP_DIR>/<YEAR>/<PLATFORM>/<DIR>/…
The YEAR and PLATFORM can be overriden using the appropriate arguments.
By default the data is copied to a ‘staging’ directory called ‘__ANALYSIS_DIR.pending’ in the archive directory. The archiving can be finalised by setting the ‘final’ argumente to ‘True’, which performs a last update of the staging area before moving the data to its final location.
Once the archive has been finalised any further archiving attempts will be refused.
Copying of the data is performed using ‘rsync’; multiple archive operations mirror the contents of the analysis directory (so any data removed from the source will also be removed from the archive).
By default the ‘bcl2fastq’ directory is omitted from the archive, unless the fastq files in any projects are links to the data. Inclusion of this directory can be forced by setting the appropriate argument.
The fastqs will be switched to be read-only in the archive by default.
- Parameters:
ap (AutoProcessor) – autoprocessor pointing to the analysis directory to be archived
archive_dir (str) – top level archive directory, of the form ‘[[user@]host:]dir’ (if not set then use the value from the auto_process.ini file).
platform (str) – set the value of the <PLATFORM> level in the archive (if not set then taken from the supplied autoprocessor instance).
year (str) – set the value of the <YEAR> level in the archive (if not set then defaults to the current year) (4 digits)
perms (str) –
change the permissions of the destination files and directories according to the supplied argument (e.g. ‘g+w’) (if not set then use the value
from the auto_process.ini file).
group (str) – set the group of the destination files to the supplied argument (if not set then use the value from the auto_process.ini file).
include_bcl2fastq (bool) – if True then force inclusion of the ‘bcl2fastq’ subdirectory; otherwise only include it if fastq files in project subdirectories are symlinks.
read_only_fastqs (bool) – if True then make the fastqs read-only in the destination directory; otherwise keep the original permissions.
runner – (optional) specify a non-default job runner to use for primary data rsync
final (bool) – if True then finalize the archive by moving the ‘.pending’ temporary archive to the final location
logging_file (str) – specify the path to a “logging file” to add details of the run to using the ‘log_seq_data.sh’ when data is moved to the final location
force (bool) – if True then do archiving even if there are errors (e.g. key metadata items not set, permission error when setting group etc); otherwise abort archiving operation.
dry_run (bool) – report what would be done but don’t perform any operations.
- Returns:
- 0 = successful termination,
non-zero indicates an error occurred.
- Return type:
UNIX-style integer returncode
- clone(clone_dir, copy_fastqs=False, exclude_projects=False)
Make a ‘clone’ (i.e. copy) of an analysis directory
Makes a functional copy of an existing analysis directory, including metadata and parameters, stats files, processing reports and project subdirectories.
By default the ‘unaligned’ directory in the new directory is simply a symlink from the original directory; set the ‘copy_fastqs’ to make copies instead.
- Arguments
- ap (AutoProcessor): autoprocessor pointing to the parent
analysis directory
- clone_dir (str): path to the new directory to create as a
clone (must not already exist).
- copy_fastqs (boolean): set to True to copy the Fastq files
(otherwise default behaviour is to make symlinks)
- exclude_projects (boolean): set to True to exclude any
projects from the parent analysis directory
- import_project(project_dir, comment=None, runner=None)
Import a project directory into an analysis directory
Importing a project directory consists of the following actions:
Copying the project directory and contents to the analysis directory
Updating ‘projects.info’ to add in the data about the imported project
Updating the project metadata and the QC report
Optionally the comments associated with the project can also be extended.
- Parameters:
ap (AutoProcessor) – autoprocessor pointing to the parent analysis directory
project_dir (str) – path to project directory to be imported
comment (str) – optional comment to append to the comments stored with the project after import
runner (JobRunner) – explicitly specify the job runner to send jobs to (overrides default runner set in the configuration)
- make_fastqs(protocol='standard', platform=None, unaligned_dir=None, sample_sheet=None, name=None, lanes=None, lane_subsets=None, nprocessors=None, bcl_converter=None, bases_mask=None, r1_length=None, r2_length=None, r3_length=None, no_lane_splitting=None, minimum_trimmed_read_length=None, mask_short_adapter_reads=None, trim_adapters=True, adapter_sequence=None, adapter_sequence_read2=None, create_fastq_for_index_read=None, find_adapters_with_sliding_window=None, generate_stats=True, stats_file=None, per_lane_stats_file=None, analyse_barcodes=True, barcode_analysis_dir=None, force_copy_of_primary_data=False, create_empty_fastqs=False, ignore_missing_bcls=False, runner=None, cellranger_jobmode=None, cellranger_mempercore=None, cellranger_maxjobs=None, cellranger_jobinterval=None, cellranger_localcores=None, cellranger_localmem=None, cellranger_ignore_dual_index=False, spaceranger_rc_i2_override=None, max_jobs=None, max_cores=None, batch_limit=None, enable_conda=None, conda_env_dir=None, use_conda_for_bcl2fastq=None, verbose=False, working_dir=None)
Create and summarise FASTQ files
Wrapper for operations related to FASTQ file generation and analysis. The operations are typically:
get primary data (BCL files)
run bcl-to-fastq conversion
generate statistics
analyse barcodes
If the number of processors and the job runner are not explicitly specified then these are taken from the settings for the bcl2fastq and the statistics generation steps, which may differ from each other. However if either of these values are set explicitly then the same values will be used for both steps.
- Parameters:
ap (AutoProcessor) – autoprocessor pointing to the analysis directory to create Fastqs for
protocol (str) – if set then specifies the protocol to use for fastq generation, otherwise use the ‘standard’ bcl2fastq protocol
platform (str) – if set then specifies the sequencing platform (otherwise platform will be determined from the primary data)
unaligned_dir (str) – if set then use this as the output directory for bcl-to-fastq conversion. Default is ‘bcl2fastq’ (unless an alternative is already specified in the config file)
sample_sheet (str) – if set then use this as the input samplesheet
name (str) – (optional) identifier for outputs that are not set explicitly
lanes (list) – (optional) specify a list of lane numbers to use in the processing; lanes not in the list will be excluded (default is to include all lanes)
lane_subsets (list) – (optional) specify a list of lane subsets to process separately before merging at the end; each subset is a dictionary which should be generated using the ‘subset’ function, and can include custom values for processing parameters (e.g. protocol, trimming and masking options etc) to override the defaults for this lane. Lanes not in a subset will still be processed unless excluded via the ‘lanes’ keyword
nprocessors (int) – number of processors to use
generate_stats (bool) – if True then (re)generate statistics file for fastqs
analyse_barcodes (bool) – if True then (re)analyse barcodes for fastqs
bcl_converter (str) – default BCL-to-Fastq conversion software to use; optionally can include a version specification (e.g. “bcl2fastq>2.0” or “bcl-convert=3.7.5”). Defaults to “bcl2fastq”
bases_mask (str) – if set then use this as an alternative bases mask setting
r1_length (int) – explicitly specify length to truncate R1 reads to (ignored if bases mask is set)
r2_length (int) – explicitly specify length to truncate R2 reads to (ignored if bases mask is set, or if there is no R2 read)
r3_length (int) – explicitly specify length to truncate R3 reads to (ignored if bases mask is set, or if there is no R2 read)
no_lane_splitting (bool) – if True then run bcl2fastq with –no-lane-splitting
minimum_trimmed_read_length (int) – if set then specify minimum length for reads after adapter trimming (shorter reads will be padded with Ns to make them long enough)
mask_short_adapter_reads (int) – if set then specify the minimum length of ACGT bases that must be present in a read after adapter trimming for it not to be masked completely with Ns.
trim_adapters (boolean) – if True (the default) then pass adapter sequence(s) to bcl2fastq to perform adapter trimming; otherwise remove adapter sequences
adapter_sequence (str) – if not None then specifies adapter sequence to use instead of any sequences already set in the samplesheet (nb will be ignored if ‘trim_adapters’ is False)
adapter_sequence_read2 (str) – if not None then specifies adapter sequence to use for read2 instead of any sequences already set in the samplesheet (nb will be ignored if ‘trim_adapters’ is False)
create_fastq_for_index_reads (bool) – if True then also create Fastq files for index reads (default, don’t create index read Fastqs)
ignore_missing_bcls (bool) – if True then tell BCL conversion software to ignore missing or corrupted BCLs (default: False, don’t ignore missing or corrupted BCL files)
find_adapters_with_sliding_window (boolean) – if True then use sliding window algorithm to identify adapter sequences for trimming
stats_file (str) – if set then use this as the name of the output per-fastq stats file.
per_lane_stats_file (str) – if set then use this as the name of the output per-lane stats file.
barcode_analysis_dir (str) – if set then specifies path to the output directory for barcode analysis
force_copy_of_primary_data (bool) – if True then force primary data to be copied (rsync’ed) even if it’s on the local system (default is to link to primary data unless it’s on a remote filesystem).
create_empty_fastqs (bool) – if True then create empty ‘placeholder’ fastq files for any missing fastqs after bcl2fastq (must have completed with zero exit status)
runner (JobRunner) – (optional) specify a non-default job runner to use for fastq generation
cellranger_jobmode (str) – (optional) job mode to run cellranger in (10xGenomics Chromium SC data only)
cellranger_mempercore (int) – (optional) memory assumed per core (in Gbs) (10xGenomics Chromium SC data only)
cellranger_maxjobs (int) – (optional) maxiumum number of concurrent jobs to run (10xGenomics Chromium SC data only)
cellranger_jobinterval (int) – (optional) how often jobs are submitted (in ms) (10xGenomics Chromium SC data only)
cellranger_localcores (int) – (optional) maximum number of cores cellranger can request in jobmode ‘local’ (10xGenomics Chromium SC data only)
cellranger_localmem (int) – (optional) maximum memory cellranger can request in jobmode ‘local’ (10xGenomics Chromium SC data only)
cellranger_ignore_dual_index (bool) – (optional) on a dual-indexed flowcell where the second index was not used for the 10x sample, ignore it (10xGenomics Chromium SC data only)
spaceranger_rc_i2_override (bool) – (optional) if set then value is passed to Spaceranger’s ‘–rc-i2-override’ option (True for reverse complement workflow B, False for forward complement workflow A). If not set then Spaceranger will be left to determine the workflow automatically
max_jobs (int) – maximum number of concurrent jobs allowed
max_cores (int) – maximum number of cores available
batch_limit (int) – if set then run commands in each task in batches, with the batch size set dyanmically so as not to exceed this limit
working_dir (str) – path to a working directory (defaults to temporary directory in the current directory)
enable_conda (bool) – if True then use conda to resolve dependencies declared on tasks in the pipeline
conda_env_dir (str) – path to non-default directory for conda environments
use_conda_for_bcl2fastq (bool) – if True then use conda packages for ‘bcl2fastq’ dependency resolution (NB ignored unless ‘enable_conda’ is also True)
verbose (bool) – if True then report additional information for pipeline diagnostics
- merge_fastq_dirs(primary_unaligned_dir, output_dir=None, dry_run=False)
Combine multiple ‘unaligned’ output directories into one
This method combines the output from multiple runs of CASAVA/bcl2fastq into a single ‘unaligned’-equivalent directory.
Currently it operates in an automatic mode and should detect additional ‘unaligned’ dirs on its own.
- Parameters:
ap (AutoProcessor) – autoprocessor pointing to the parent analysis directory
primary_unaligned_dir (str) – the ‘unaligned’ dir that data from from all others will be put into (relative path), unless overridden by ‘output_dir’ argument
output_dir (str) – optional, new ‘unaligned’ dir that will be created to hold merged data (relative path, defaults to ‘primary_unaligned_dir’)
dry_run (boolean) – if True then just report operations that would have been performed.
- publish_qc(projects=None, location=None, ignore_missing_qc=False, regenerate_reports=False, force=False, use_hierarchy=False, exclude_zip_files=False, legacy=False, runner=None, base_url=None, suppress_warnings=False)
Copy the QC reports to the webserver
Looks for and copies various QC reports and outputs to a ‘QC server’ directory, and generates an HTML index.
The reports include:
processing QC reports
‘cellranger mkfastq’ QC report
barcode analysis report
Also if the analysis includes project directories then for each Fastq set in each project:
QC report for standard QC
Also if a project comprises 10xGenomics data:
‘cellranger count’ reports for each sample
In ‘legacy’ mode, the top-level report will also contain explicit links for each project for the following (where appropriate):
cellranger count outputs
MultiQC report
(These reports should now be accessible from the per-project QC reports, regardless of whther ‘legacy’ mode is specified.)
Raises an exception if:
‘source’ and ‘run_number’ metadata items are not set
a subset of projects don’t have associated QC outputs (unless ‘ignore_missing_qc’ is True)
- Parameters:
ap (AutoProcessor) – autoprocessor pointing to the analysis directory to publish QC for
projects (str) – specify a glob-style pattern to match one or more projects to publish the reports for (default is to publish all reports)
location (str) – override the target location specified in the settings; can be of the form ‘[[user@]server:]directory’
ignore_missing_qc (bool) – if True then skip directories with missing QC data or reports (default is to raise an exception if projects have missing QC)
regenerate_reports (bool) – if True then try to create reports even when they already exist (default is to use existing reports)
force (bool) – if True then force QC report (re)generation even if QC is unverified (default is to raise an exception if projects cannot be verified)
use_hierarchy (bool) – if True then publish to a YEAR/PLATFORM subdirectory under the target location (default is not to use the hierarchy)
exclude_zip_files (bool) – if True then exclude any ZIP archives from publication (default is to include ZIP files)
legacy (bool) – if True then operate in ‘legacy’ mode (i.e. explicitly include MultiQC reports for each project)
runner (JobRunner) – explicitly specify the job runner to send jobs to (overrides runner set in the configuration)
base_url (str) – base URL for the QC server
suppress_warnings (bool) – if True then don’t report warnings in QC reports or the index page (even if there are missing metrics in individual reports)
- report(mode=None, fields=None, out_file=None)
Print a report on an analysis project
- Parameters:
ap (AutoProcessor) – autoprocessor pointing to the analysis directory to be reported on
mode (int) – reporting mode (concise, summary, projects or info)
fields (list) – optional set of fields to report (only for ‘projects’ reporting mode)
out_file (str) – optional, path to a file to write the report to (default is to write to stdout)
- run_qc(projects=None, protocols=None, fastq_screens=None, fastq_subset=100000, nthreads=None, runner=None, fastq_dir=None, qc_dir=None, organisms=None, cellranger_exe=None, cellranger_chemistry='auto', cellranger_force_cells=None, cellranger_transcriptomes=None, cellranger_premrna_references=None, cellranger_extra_project_dirs=None, report_html=None, run_multiqc=True, working_dir=None, verbose=None, max_jobs=None, max_cores=None, batch_limit=None, enable_conda=None, conda_env_dir=None, poll_interval=None)
Run QC pipeline script for projects
Run the illumina_qc.sh script to perform QC on projects.
Note that if all QC outputs already exist for a project then the QC will not be run for that project.
A subset of projects can be selected for QC by setting the ‘projects’ argument to a name or pattern, only matching projects will be examined.
- Parameters:
projects (str) – specify a pattern to match one or more projects to run the QC for (default is to run QC for all projects)
protocols (dict) – mapping of project names to QC protocols; where a project name appears the specified protocol will be used, otherwise the QC protocol will be determined automatically from the project metadata (default is for protocols to be automatically determined for all projects)
fastq_screens (dict) – mapping of Fastq screen names to corresponding conf files, to use for contaminant screens
fastq_subset (int) – maximum size of subset of reads to use for FastQScreen, BAM file generation etc; set to zero or None to use all reads (default: 100000)
nthreads (int) – specify number of threads to run the QC jobs with (default: 1)
runner (JobRunner) – specify a non-default job runner to use for the QC jobs
fastq_dir (str) – specify the subdirectory to take the Fastq files from; will be used for all projects that are processed (default: ‘fastqs’)
qc_dir (str) – specify a non-standard directory to write the QC outputs to; will be used for all projects that are processed (default: ‘qc’)
organisms (dict) – mapping of organism names to QC protocols; where a project name appears the specified organism will be used, overriding the organism defined in the project metadata (default is for all organisms to be taken from the projects)
cellranger_exe (str) – explicitly specify path to cellranger executable to use for 10xGenomics projects (default: determine appropriate executable automatically)
cellranger_chemistry (str) – assay configuration for 10xGenomics scRNA-seq data (set to ‘auto’ to let cellranger determine this automatically; default: ‘auto’)
cellranger_force_cells (int) – override cell detection algorithm and set number of cells in ‘cellranger’ and ‘cellranger-atac’ (set to ‘None’ to use built-in cell detection; default: ‘None’)
cellranger_transcriptomes (dict) – mapping of organism names to cellranger transcriptome reference data
cellranger_premrna_references (dict) – mapping of organism names to cellranger pre-mRNA reference data
cellranger_extra_project_dirs (str) – optional list of additional project dirs to use in single library analyses
report_html (str) – specify the name for the output HTML QC report (default: ‘<QC_DIR>_report.html’)
run_multiqc (bool) – if True then run MultiQC at the end of the QC run (default)
working_dir (str) – path to a working directory (defaults to temporary directory in the current directory)
verbose (bool) – if True then report additional information for pipeline diagnostics
max_jobs (int) – maximum number of jobs that will be scheduled to run at one time (passed to the scheduler; default: no limit)
max_cores (int) – maximum number of cores available to the scheduler (default: no limit)
batch_limit (int) – if set then run commands in each task in batches, with the batch size set dyanmically so as not to exceed this limit
enable_conda (bool) – if True then use conda to resolve dependencies declared on tasks in the pipeline
conda_env_dir (str) – path to non-default directory for conda environments
poll_interval (float) – specifies non-default polling interval for scheduler used for running QC
- Returns:
- UNIX-style integer returncode where 0 = successful
termination, non-zero indicates an error.
- Return type:
Integer
- samplesheet(cmd, *args, **kws)
Various sample sheet manipulations
- Parameters:
ap (AutoProcessor) – autoprocessor pointing to the analysis directory to operate on
cmd (int) – sample sheet operation to perform
args (list) – positional arguments specific to the command
kws (mapping) – keyword arguments specific to the command
- setup(data_dir, analysis_dir=None, sample_sheet=None, run_number=None, analysis_number=None, extra_files=None, unaligned_dir=None)
Set up the initial analysis directory
This does all the initialisation of the analysis directory and processing parameters
- Parameters:
ap (AutoProcess) – autoprocessor pointing to the analysis directory to create Fastqs for
data_dir (str) – source data directory
analysis_dir (str) – corresponding analysis directory
sample_sheet (str) – name and location of non-default sample sheet file; can be a local or remote file, or a URL (optional, will use sample sheet from the source data directory if present)
run_number (str) – facility run number
analysis_number (str) – optional number assigned to the analysis to distinguish it from other processing or analysis attempts. If supplied then will be appended to the analysis directory name (unless a name is explicitly supplied via ‘analysis_dir’)
extra_files (list) – arbitrary additional files to copy into the new analysis directory; each file can be a local or remote file or a URL
unaligned_dir (str) – directory with existing Fastqs output from CASAVA or bcl2fastq2; if specified then Fastqs will be taken from this directory (optional)
- setup_analysis_dirs(name=None, unaligned_dir=None, project_metadata_file=None, ignore_missing_metadata=False, short_fastq_names=False, link_to_fastqs=False, projects=None, undetermined_project=None, custom_metadata_items=None)
Construct and populate project analysis directories
- Parameters:
ap (AutoProcess) – AutoProcess instance
unaligned_dir (str) – optional, name of ‘unaligned’ subdirectory (defaults to value stored in parameters)
name (str) – (optional) identifier to append to output project directories
project_metadata_file (str) – optional, name of the ‘projects.info’ metadata file to take project information from
ignore_missing_metadata (bool) – if True then make project directories for all projects even if metadata hasn’t been set (default is to stop if metadata isn’t set)
short_fastq_names (bool) – if True then use ‘short’ Fastq names (default is to use full Fastq names as output from bcl2fastq)
link_to_fastqs (bool) – if True then make symbolic links to the Fastq files in the source ‘unaligned’ subdir (default is to make hard links)
projects (list) – optional, subset of projects to create analysis dirs for (default is to attempt to create directories for all projects in the metadata file).
undetermined_project (str) –
- optional, specify name for
project directory to create with ‘undetermined’ Fastqs
(defaults to ‘undetermined’)
custom_metadata_items (list) – optional, list of strings defining additional custom metadata items to add to the core metadata items for each project (overrides custom items specified in configuration file)
- update(update_paths=True, update_project_metadata=True, update_sync_projects=True, update_qc_reports=True)
Update metadata and artefacts in analysis directory
- Parameters:
ap (AutoProcessor) – autoprocessor pointing to the analysis directory to publish QC for
update_paths (bool) – whether to update analysis directory paths in metadata and parameter files (default: True)
update_project_metadata (bool) – whether to update metadata stored in ‘projects.info’ and in the project directories (default: True)
update_sync_projects (bool) – whether to update projects listed in ‘projects.info’ against project directories on the filesystem (default: True)
update_qc_reports (bool) – whether to update QC reports in projects where existing report is older than the project metadata file (default: True)
- update_fastq_stats(sample_sheet=None, name=None, stats_file=None, per_lane_stats_file=None, unaligned_dir=None, add_data=False, force=False, nprocessors=None, runner=None)
Update statistics for Fastq files
Updates the statistics for all Fastq files found in the ‘unaligned’ directory, by running the ‘fastq_statistics.py’ program.
- Arguments
- ap (AutoProcessor): autoprocessor pointing to the analysis
directory to create Fastqs for
- sample_sheet (str): path to sample sheet file used in
bcl-to-fastq conversion (defaults to the sample sheet stored in the analysis directory parameters)
name (str): identifier to use for output stats files stats_file (str): path of a non-default file to write the
statistics to (defaults to ‘statistics.info’ unless over-ridden by local settings)
- per_lane_stats_file (str): path for per-lane statistics
output file (defaults to ‘per_lane_statistics.info’ unless over-ridden by local settings)
- unaligned_dir (str): output directory for bcl-to-fastq
conversion
- add_data (bool): if True then add stats to the existing
stats files (default is to overwrite existing stats files)
- force (bool): if True then force update of the stats
files even if they are newer than the Fastq files (by default stats are only updated if they are older than the Fastqs)
- nprocessors (int): number of cores to use when running
‘fastq_statistics.py’
- runner (JobRunner): (optional) specify a non-default job
runner to use for running ‘fastq_statistics.py’
- auto_process_ngs.cli.auto_process.add_analyse_barcodes_command(cmdparser)
Create a parser for the ‘analyse_barcodes’ command
- auto_process_ngs.cli.auto_process.add_archive_command(cmdparser)
Create a parser for the ‘archive’ command
- auto_process_ngs.cli.auto_process.add_clone_command(cmdparser)
Create a parser for the ‘clone’ command
- auto_process_ngs.cli.auto_process.add_config_command(cmdparser)
Create a parser for the ‘config’ command
- auto_process_ngs.cli.auto_process.add_import_project_command(cmdparser)
Create a parser for the ‘import_project’ command
- auto_process_ngs.cli.auto_process.add_info_command(cmdparser)
Create a parser for the ‘info’ command
- auto_process_ngs.cli.auto_process.add_make_fastqs_command(cmdparser)
Create a parser for the ‘make_fastqs’ command
- auto_process_ngs.cli.auto_process.add_merge_fastq_dirs_command(cmdparser)
Create a parser for the ‘merge_fastq_dirs’ command
- auto_process_ngs.cli.auto_process.add_metadata_command(cmdparser)
Create a parser for the ‘metadata’ command
- auto_process_ngs.cli.auto_process.add_params_command(cmdparser)
Create a parser for the ‘params’ command
- auto_process_ngs.cli.auto_process.add_publish_qc_command(cmdparser)
Create a parser for the ‘publish_qc’ command
- auto_process_ngs.cli.auto_process.add_readme_command(cmdparser)
Create a parser for the ‘readme’ command
- auto_process_ngs.cli.auto_process.add_report_command(cmdparser)
Create a parser for the ‘report’ command
- auto_process_ngs.cli.auto_process.add_run_qc_command(cmdparser)
Create a parser for the ‘run_qc’ command
- auto_process_ngs.cli.auto_process.add_samplesheet_command(cmdparser)
Create a parser for the ‘samplesheet’ command
- auto_process_ngs.cli.auto_process.add_setup_analysis_dirs_command(cmdparser)
Create a parser for the ‘setup_analysis_dirs’ command
- auto_process_ngs.cli.auto_process.add_setup_command(cmdparser)
Create a parser for the ‘setup’ command
- auto_process_ngs.cli.auto_process.add_update_command(cmdparser)
Create a parser for the ‘update’ command
- auto_process_ngs.cli.auto_process.add_update_fastq_stats_command(cmdparser)
Create a parser for the ‘update_fastq_stats’ command
- auto_process_ngs.cli.auto_process.analyse_barcodes(args)
Implement functionality for ‘analyse_barcodes’ command
- auto_process_ngs.cli.auto_process.archive(args)
Implement functionality for ‘archive’ command
- auto_process_ngs.cli.auto_process.clone(args)
Implement functionality for ‘clone’ command
- auto_process_ngs.cli.auto_process.config(args)
Implement functionality for ‘config’ command
- auto_process_ngs.cli.auto_process.import_project(args)
Implement functionality for ‘import_project’ command
- auto_process_ngs.cli.auto_process.info(args)
Implement functionality for the ‘info’ command
- auto_process_ngs.cli.auto_process.main(argv=None)
Run ‘auto_process.py COMMAND…’
- Parameters:
argv (list) – optional, command line arguments to process (otherwise take arguments from ‘sys.argv’)
- Returns:
0 on success, 1 on failure.
- Return type:
Integer
- auto_process_ngs.cli.auto_process.make_fastqs(args)
Implement functionality for ‘make_fastqs’ command
- auto_process_ngs.cli.auto_process.merge_fastq_dirs(args)
Implement functionality for ‘merge_fastq_dirs’ command
- auto_process_ngs.cli.auto_process.metadata(args)
Implement functionality for ‘metadata’ command
- auto_process_ngs.cli.auto_process.params(args)
Implement functionality for ‘params’ command
- auto_process_ngs.cli.auto_process.publish_qc(args)
Implement functionality for ‘publish_qc’ command
- auto_process_ngs.cli.auto_process.readme(args)
Implement functionality for ‘readme’ command
- auto_process_ngs.cli.auto_process.report(args)
Implement functionality for ‘report’ command
- auto_process_ngs.cli.auto_process.run_qc(args)
Implement functionality for ‘run_qc’ command
- auto_process_ngs.cli.auto_process.samplesheet(args)
Implement functionality for ‘samplesheet’ command
- auto_process_ngs.cli.auto_process.set_debug(debug_flag)
Turn on debug output
- auto_process_ngs.cli.auto_process.setup(args)
Implement functionality for ‘setup’ command
- auto_process_ngs.cli.auto_process.setup_analysis_dirs(args)
Implement functionality for ‘setup_analysis_dirs’ command
- auto_process_ngs.cli.auto_process.update(args)
Implement functionality for ‘update’ command
- auto_process_ngs.cli.auto_process.update_fastq_stats(args)
Implement functionality for ‘update_fastq_stats’ command