auto_process_ngs.commands.run_qc_cmd

auto_process_ngs.commands.run_qc_cmd.run_qc(ap, projects=None, fastq_screens=None, fastq_subset=100000, nthreads=None, runner=None, fastq_dir=None, qc_dir=None, cellranger_exe=None, cellranger_chemistry='auto', cellranger_force_cells=None, cellranger_transcriptomes=None, cellranger_premrna_references=None, cellranger_extra_project_dirs=None, report_html=None, run_multiqc=True, working_dir=None, verbose=None, max_jobs=None, max_cores=None, batch_limit=None, enable_conda=None, conda_env_dir=None, poll_interval=None)

Run QC pipeline script for projects

Run the illumina_qc.sh script to perform QC on projects.

Note that if all QC outputs already exist for a project then the QC will not be run for that project.

A subset of projects can be selected for QC by setting the ‘projects’ argument to a name or pattern, only matching projects will be examined.

Parameters:
  • projects (str) – specify a pattern to match one or more projects to run the QC for (default is to run QC for all projects)

  • fastq_screens (dict) – mapping of Fastq screen names to corresponding conf files, to use for contaminant screens

  • fastq_subset (int) – maximum size of subset of reads to use for FastQScreen, BAM file generation etc; set to zero or None to use all reads (default: 100000)

  • nthreads (int) – specify number of threads to run the QC jobs with (default: 1)

  • runner (JobRunner) – specify a non-default job runner to use for the QC jobs

  • fastq_dir (str) – specify the subdirectory to take the Fastq files from; will be used for all projects that are processed (default: ‘fastqs’)

  • qc_dir (str) – specify a non-standard directory to write the QC outputs to; will be used for all projects that are processed (default: ‘qc’)

  • cellranger_exe (str) – explicitly specify path to cellranger executable to use for 10xGenomics projects (default: determine appropriate executable automatically)

  • cellranger_chemistry (str) – assay configuration for 10xGenomics scRNA-seq data (set to ‘auto’ to let cellranger determine this automatically; default: ‘auto’)

  • cellranger_force_cells (int) – override cell detection algorithm and set number of cells in ‘cellranger’ and ‘cellranger-atac’ (set to ‘None’ to use built-in cell detection; default: ‘None’)

  • cellranger_transcriptomes (dict) – mapping of organism names to cellranger transcriptome reference data

  • cellranger_premrna_references (dict) – mapping of organism names to cellranger pre-mRNA reference data

  • cellranger_extra_project_dirs (str) – optional list of additional project dirs to use in single library analyses

  • report_html (str) – specify the name for the output HTML QC report (default: ‘<QC_DIR>_report.html’)

  • run_multiqc (bool) – if True then run MultiQC at the end of the QC run (default)

  • working_dir (str) – path to a working directory (defaults to temporary directory in the current directory)

  • verbose (bool) – if True then report additional information for pipeline diagnostics

  • max_jobs (int) – maximum number of jobs that will be scheduled to run at one time (passed to the scheduler; default: no limit)

  • max_cores (int) – maximum number of cores available to the scheduler (default: no limit)

  • batch_limit (int) – if set then run commands in each task in batches, with the batch size set dyanmically so as not to exceed this limit

  • enable_conda (bool) – if True then use conda to resolve dependencies declared on tasks in the pipeline

  • conda_env_dir (str) – path to non-default directory for conda environments

  • poll_interval (float) – specifies non-default polling interval for scheduler used for running QC

Returns:

UNIX-style integer returncode where 0 = successful

termination, non-zero indicates an error.

Return type:

Integer