auto_process_ngs.commands.run_qc_cmd

auto_process_ngs.commands.run_qc_cmd.run_qc(ap, projects=None, protocols=None, fastq_screens=None, fastq_subset=100000, nthreads=None, runner=None, fastq_dir=None, qc_dir=None, organisms=None, cellranger_exe=None, cellranger_chemistry='auto', cellranger_force_cells=None, cellranger_transcriptomes=None, cellranger_premrna_references=None, cellranger_extra_project_dirs=None, report_html=None, run_multiqc=True, working_dir=None, verbose=None, max_jobs=None, max_cores=None, batch_limit=None, enable_conda=None, conda_env_dir=None, poll_interval=None)

Run QC pipeline script for projects

Run the illumina_qc.sh script to perform QC on projects.

Note that if all QC outputs already exist for a project then the QC will not be run for that project.

A subset of projects can be selected for QC by setting the ‘projects’ argument to a name or pattern, only matching projects will be examined.

Parameters:
  • projects (str) – specify a pattern to match one or more projects to run the QC for (default is to run QC for all projects)

  • protocols (dict) – mapping of project names to QC protocols; where a project name appears the specified protocol will be used, otherwise the QC protocol will be determined automatically from the project metadata (default is for protocols to be automatically determined for all projects)

  • fastq_screens (dict) – mapping of Fastq screen names to corresponding conf files, to use for contaminant screens

  • fastq_subset (int) – maximum size of subset of reads to use for FastQScreen, BAM file generation etc; set to zero or None to use all reads (default: 100000)

  • nthreads (int) – specify number of threads to run the QC jobs with (default: 1)

  • runner (JobRunner) – specify a non-default job runner to use for the QC jobs

  • fastq_dir (str) – specify the subdirectory to take the Fastq files from; will be used for all projects that are processed (default: ‘fastqs’)

  • qc_dir (str) – specify a non-standard directory to write the QC outputs to; will be used for all projects that are processed (default: ‘qc’)

  • organisms (dict) – mapping of organism names to QC protocols; where a project name appears the specified organism will be used, overriding the organism defined in the project metadata (default is for all organisms to be taken from the projects)

  • cellranger_exe (str) – explicitly specify path to cellranger executable to use for 10xGenomics projects (default: determine appropriate executable automatically)

  • cellranger_chemistry (str) – assay configuration for 10xGenomics scRNA-seq data (set to ‘auto’ to let cellranger determine this automatically; default: ‘auto’)

  • cellranger_force_cells (int) – override cell detection algorithm and set number of cells in ‘cellranger’ and ‘cellranger-atac’ (set to ‘None’ to use built-in cell detection; default: ‘None’)

  • cellranger_transcriptomes (dict) – mapping of organism names to cellranger transcriptome reference data

  • cellranger_premrna_references (dict) – mapping of organism names to cellranger pre-mRNA reference data

  • cellranger_extra_project_dirs (str) – optional list of additional project dirs to use in single library analyses

  • report_html (str) – specify the name for the output HTML QC report (default: ‘<QC_DIR>_report.html’)

  • run_multiqc (bool) – if True then run MultiQC at the end of the QC run (default)

  • working_dir (str) – path to a working directory (defaults to temporary directory in the current directory)

  • verbose (bool) – if True then report additional information for pipeline diagnostics

  • max_jobs (int) – maximum number of jobs that will be scheduled to run at one time (passed to the scheduler; default: no limit)

  • max_cores (int) – maximum number of cores available to the scheduler (default: no limit)

  • batch_limit (int) – if set then run commands in each task in batches, with the batch size set dyanmically so as not to exceed this limit

  • enable_conda (bool) – if True then use conda to resolve dependencies declared on tasks in the pipeline

  • conda_env_dir (str) – path to non-default directory for conda environments

  • poll_interval (float) – specifies non-default polling interval for scheduler used for running QC

Returns:

UNIX-style integer returncode where 0 = successful

termination, non-zero indicates an error.

Return type:

Integer