auto_process_ngs.qc.fastq_stats
- class auto_process_ngs.qc.fastq_stats.FastqQualityStats
Class for storing per-base quality stats from a FASTQ
This class acts as a container for per-base quality statistics from a FASTQ file; the statistics can be accessed via various properties:
mean: mean of all quality scores at each position median: mean of all quality scores q25: lower quartile q75: upper quartile p10: 10th percentile p90: 90th percentile
Each of these properties is a Python list, so to access e.g. the mean for the 5th base:
>>> mean = fqstats.mean[4]
(as the lists are indexed starting from zero).
The
nbases
property returns the number of bases for which data is stored.The statistics can be populated directly from a FASTQ file, for example:
>>> stats = FastqQualityStats() >>> stats.from_fastq('example.fq')
Alternatively they can be loaded from a
fastqc_data.txt
file output from the FastQC program:>>> stats.from_fastqc_data('example_fastqc/fastqc_data.txt')
- from_fastq(fastq)
Get statistics from a FASTQ file
Generates and stores statistics from a FASTQ file.
- Parameters:
fastq (str) – path to a FASTQ file (can be gzipped)
- from_fastqc_data(fastqc_data)
Get statistics from a FastQC data file
Reads in the per-base quality statistics from the
Per base sequence quality
module of the FastQC program, which are stored in thefastqc_data.txt
output file.- Parameters:
fastqc_data (str) – path to a FastQC
fastqc_data.txt
file
- property nbases
Return the number of bases for which data is stored