auto_process_ngs.commands.run_qc_cmd
- auto_process_ngs.commands.run_qc_cmd.run_qc(ap, projects=None, fastq_screens=None, fastq_subset=100000, nthreads=None, runner=None, fastq_dir=None, qc_dir=None, cellranger_exe=None, cellranger_chemistry='auto', cellranger_force_cells=None, cellranger_transcriptomes=None, cellranger_premrna_references=None, cellranger_extra_project_dirs=None, report_html=None, run_multiqc=True, working_dir=None, verbose=None, max_jobs=None, max_cores=None, batch_limit=None, enable_conda=None, conda_env_dir=None, poll_interval=None)
Run QC pipeline script for projects
Run the illumina_qc.sh script to perform QC on projects.
Note that if all QC outputs already exist for a project then the QC will not be run for that project.
A subset of projects can be selected for QC by setting the ‘projects’ argument to a name or pattern, only matching projects will be examined.
- Parameters:
projects (str) – specify a pattern to match one or more projects to run the QC for (default is to run QC for all projects)
fastq_screens (dict) – mapping of Fastq screen names to corresponding conf files, to use for contaminant screens
fastq_subset (int) – maximum size of subset of reads to use for FastQScreen, BAM file generation etc; set to zero or None to use all reads (default: 100000)
nthreads (int) – specify number of threads to run the QC jobs with (default: 1)
runner (JobRunner) – specify a non-default job runner to use for the QC jobs
fastq_dir (str) – specify the subdirectory to take the Fastq files from; will be used for all projects that are processed (default: ‘fastqs’)
qc_dir (str) – specify a non-standard directory to write the QC outputs to; will be used for all projects that are processed (default: ‘qc’)
cellranger_exe (str) – explicitly specify path to cellranger executable to use for 10xGenomics projects (default: determine appropriate executable automatically)
cellranger_chemistry (str) – assay configuration for 10xGenomics scRNA-seq data (set to ‘auto’ to let cellranger determine this automatically; default: ‘auto’)
cellranger_force_cells (int) – override cell detection algorithm and set number of cells in ‘cellranger’ and ‘cellranger-atac’ (set to ‘None’ to use built-in cell detection; default: ‘None’)
cellranger_transcriptomes (dict) – mapping of organism names to cellranger transcriptome reference data
cellranger_premrna_references (dict) – mapping of organism names to cellranger pre-mRNA reference data
cellranger_extra_project_dirs (str) – optional list of additional project dirs to use in single library analyses
report_html (str) – specify the name for the output HTML QC report (default: ‘<QC_DIR>_report.html’)
run_multiqc (bool) – if True then run MultiQC at the end of the QC run (default)
working_dir (str) – path to a working directory (defaults to temporary directory in the current directory)
verbose (bool) – if True then report additional information for pipeline diagnostics
max_jobs (int) – maximum number of jobs that will be scheduled to run at one time (passed to the scheduler; default: no limit)
max_cores (int) – maximum number of cores available to the scheduler (default: no limit)
batch_limit (int) – if set then run commands in each task in batches, with the batch size set dyanmically so as not to exceed this limit
enable_conda (bool) – if True then use conda to resolve dependencies declared on tasks in the pipeline
conda_env_dir (str) – path to non-default directory for conda environments
poll_interval (float) – specifies non-default polling interval for scheduler used for running QC
- Returns:
- UNIX-style integer returncode where 0 = successful
termination, non-zero indicates an error.
- Return type:
Integer