auto_process_ngs.qc.utils
Provides utility classes and functions for analysis project QC.
Provides the following functions:
verify_qc: verify the QC run for a project
report_qc: generate report for the QC run for a project
get_bam_basename: return the BAM file basename from a Fastq filename
get_seq_data_samples: identify samples with biological (sequencing) data
set_cell_count_for_project: sets total number of cells for a project
- auto_process_ngs.qc.utils.get_bam_basename(fastq, fastq_attrs=None)
Return basename for BAM file from Fastq filename
Typically this will be the Fastq basename with the read ID removed, for example the Fastq filename ‘SM1_S1_L001_R1_001.fastq.gz’ will result in the BAM basename of ‘SM1_S1_L001_001’.
- Parameters:
fastq (str) – Fastq filename; can include leading path and extensions (both will be ignored)
fastq_attrs (BaseFastqAttrs) – class for extracting data from Fastq names (defaults to ‘AnalysisFastq’)
- Returns:
basename for BAM file.
- Return type:
String
- auto_process_ngs.qc.utils.get_seq_data_samples(project_dir, fastq_attrs=None)
Identify samples with biological (sequencing) data
- Parameters:
project_dir (str) – path to the project directory
fastq_attrs (BaseFastqAttrs) – class for extracting data from Fastq names (defaults to ‘AnalysisFastq’)
- Returns:
- list with subset of samples with biological
data
- Return type:
- auto_process_ngs.qc.utils.report_qc(project, qc_dir=None, fastq_dir=None, qc_protocol=None, report_html=None, zip_outputs=True, multiqc=False, force=False, runner=None, log_dir=None, suppress_warning=False)
Generate report for the QC run for a project
- Parameters:
project (AnalysisProject) – analysis project to report the QC for
qc_dir (str) – optional, specify the subdir with the QC outputs being reported
fastq_dir (str) – optional, specify a non-default directory with Fastq files being verified
qc_protocol (str) – optional, QC protocol to verify against
report_html (str) – optional, path to the name of the output QC report
zip_outputs (bool) – if True then also generate ZIP archive with the report and QC outputs
multiqc (bool) – if True then also generate MultiQC report
force (bool) – if True then force generation of QC report even if verification fails
runner (JobRunner) – optional, job runner to use for running the reporting
log_dir (str) – optional, specify a directory to write logs to
suppress_warning (bool) – if True then don’t show the warning message even when there are missing metrics (default: show the warning if there are missing metrics)
- Returns:
- exit code from reporting job (zero indicates
success, non-zero indicates a problem).
- Return type:
Integer
- auto_process_ngs.qc.utils.set_cell_count_for_project(project_dir, qc_dir=None, source='count')
Set the total number of cells for a project
Depending on the specified ‘source’, sums the number of cells for each sample in a project as determined from either ‘cellranger* count’ or ‘cellranger multi’.
Depending the 10x Genomics package and analysis type the cell count for individual samples is extracted from the ‘metrics_summary.csv’ file for scRNA-seq (i.e. ‘cellranger count’ or ‘cellranger multi’), or from the ‘summary.csv’ file for scATAC (ie. ‘cellranger-atac count’).
The final count is written to the ‘number_of_cells’ metadata item for the project.
- Parameters:
project_dir (str) – path to the project directory
qc_dir (str) – path to QC directory (if not the default QC directory for the project)
source (str) – either ‘count’ or ‘multi’ (default is ‘count’)
- Returns:
- exit code, non-zero values indicate problems
were encountered.
- Return type:
Integer
- auto_process_ngs.qc.utils.verify_qc(project, qc_dir=None, fastq_dir=None, qc_protocol=None, runner=None, log_dir=None)
Verify the QC run for a project
- Parameters:
project (AnalysisProject) – analysis project to verify the QC for
qc_dir (str) – optional, specify the subdir with the QC outputs being verified
fastq_dir (str) – optional, specify a non-default directory with Fastq files being verified
qc_protocol (str) – optional, QC protocol to verify against
runner (JobRunner) – optional, job runner to use for running the verification
log_dir (str) – optional, specify a directory to write logs to
- Returns:
- True if QC passes verification, otherwise
False.
- Return type:
Boolean