auto_process_ngs.qc.fastq_stats

class auto_process_ngs.qc.fastq_stats.FastqQualityStats

Class for storing per-base quality stats from a FASTQ

This class acts as a container for per-base quality statistics from a FASTQ file; the statistics can be accessed via various properties:

mean: mean of all quality scores at each position median: mean of all quality scores q25: lower quartile q75: upper quartile p10: 10th percentile p90: 90th percentile

Each of these properties is a Python list, so to access e.g. the mean for the 5th base:

>>> mean = fqstats.mean[4]

(as the lists are indexed starting from zero).

The nbases property returns the number of bases for which data is stored.

The statistics can be populated directly from a FASTQ file, for example:

>>> stats = FastqQualityStats()
>>> stats.from_fastq('example.fq')

Alternatively they can be loaded from a fastqc_data.txt file output from the FastQC program:

>>> stats.from_fastqc_data('example_fastqc/fastqc_data.txt')
from_fastq(fastq)

Get statistics from a FASTQ file

Generates and stores statistics from a FASTQ file.

Parameters:

fastq (str) – path to a FASTQ file (can be gzipped)

from_fastqc_data(fastqc_data)

Get statistics from a FastQC data file

Reads in the per-base quality statistics from the Per base sequence quality module of the FastQC program, which are stored in the fastqc_data.txt output file.

Parameters:

fastqc_data (str) – path to a FastQC fastqc_data.txt file

property nbases

Return the number of bases for which data is stored