auto_process_ngs.qc.utils

Provides utility classes and functions for analysis project QC.

Provides the following functions:

  • verify_qc: verify the QC run for a project

  • report_qc: generate report for the QC run for a project

  • get_bam_basename: return the BAM file basename from a Fastq filename

  • get_seq_data_samples: identify samples with biological (sequencing) data

  • set_cell_count_for_project: sets total number of cells for a project

auto_process_ngs.qc.utils.get_bam_basename(fastq, fastq_attrs=None)

Return basename for BAM file from Fastq filename

Typically this will be the Fastq basename with the read ID removed, for example the Fastq filename ‘SM1_S1_L001_R1_001.fastq.gz’ will result in the BAM basename of ‘SM1_S1_L001_001’.

Parameters:
  • fastq (str) – Fastq filename; can include leading path and extensions (both will be ignored)

  • fastq_attrs (BaseFastqAttrs) – class for extracting data from Fastq names (defaults to ‘AnalysisFastq’)

Returns:

basename for BAM file.

Return type:

String

auto_process_ngs.qc.utils.get_seq_data_samples(project_dir, fastq_attrs=None)

Identify samples with biological (sequencing) data

Parameters:
  • project_dir (str) – path to the project directory

  • fastq_attrs (BaseFastqAttrs) – class for extracting data from Fastq names (defaults to ‘AnalysisFastq’)

Returns:

list with subset of samples with biological

data

Return type:

List

auto_process_ngs.qc.utils.report_qc(project, qc_dir=None, fastq_dir=None, qc_protocol=None, report_html=None, zip_outputs=True, multiqc=False, force=False, runner=None, log_dir=None, suppress_warning=False)

Generate report for the QC run for a project

Parameters:
  • project (AnalysisProject) – analysis project to report the QC for

  • qc_dir (str) – optional, specify the subdir with the QC outputs being reported

  • fastq_dir (str) – optional, specify a non-default directory with Fastq files being verified

  • qc_protocol (str) – optional, QC protocol to verify against

  • report_html (str) – optional, path to the name of the output QC report

  • zip_outputs (bool) – if True then also generate ZIP archive with the report and QC outputs

  • multiqc (bool) – if True then also generate MultiQC report

  • force (bool) – if True then force generation of QC report even if verification fails

  • runner (JobRunner) – optional, job runner to use for running the reporting

  • log_dir (str) – optional, specify a directory to write logs to

  • suppress_warning (bool) – if True then don’t show the warning message even when there are missing metrics (default: show the warning if there are missing metrics)

Returns:

exit code from reporting job (zero indicates

success, non-zero indicates a problem).

Return type:

Integer

auto_process_ngs.qc.utils.set_cell_count_for_project(project_dir, qc_dir=None, source='count')

Set the total number of cells for a project

Depending on the specified ‘source’, sums the number of cells for each sample in a project as determined from either ‘cellranger* count’ or ‘cellranger multi’.

Depending the 10x Genomics package and analysis type the cell count for individual samples is extracted from the ‘metrics_summary.csv’ file for scRNA-seq (i.e. ‘cellranger count’ or ‘cellranger multi’), or from the ‘summary.csv’ file for scATAC (ie. ‘cellranger-atac count’).

The final count is written to the ‘number_of_cells’ metadata item for the project.

Parameters:
  • project_dir (str) – path to the project directory

  • qc_dir (str) – path to QC directory (if not the default QC directory for the project)

  • source (str) – either ‘count’ or ‘multi’ (default is ‘count’)

Returns:

exit code, non-zero values indicate problems

were encountered.

Return type:

Integer

auto_process_ngs.qc.utils.verify_qc(project, qc_dir=None, fastq_dir=None, qc_protocol=None, runner=None, log_dir=None)

Verify the QC run for a project

Parameters:
  • project (AnalysisProject) – analysis project to verify the QC for

  • qc_dir (str) – optional, specify the subdir with the QC outputs being verified

  • fastq_dir (str) – optional, specify a non-default directory with Fastq files being verified

  • qc_protocol (str) – optional, QC protocol to verify against

  • runner (JobRunner) – optional, job runner to use for running the verification

  • log_dir (str) – optional, specify a directory to write logs to

Returns:

True if QC passes verification, otherwise

False.

Return type:

Boolean