auto_process_ngs.stats
stats.py
Classes and functions for collecting and reporting statistics for a run:
FastqStatistics: collects and reports stats on FASTQs from an Illumina sequencing run
FastqStats: container for storing data about a FASTQ file
collect_fastq_data: collect data from FASTQ file in a FastqStats instance
- class auto_process_ngs.stats.FastqStatistics(illumina_data, n_processors=1, add_to=None)
Class for collecting and reporting stats on Illumina FASTQs
Given a directory with fastq(.gz) files arranged in the same structure as the output from bcl2fastq or bcl2fastq2, collects statistics for each file and provides methods for reporting different aspects.
Example usage:
>>> from IlluminaData import IlluminaData >>> data = IlluminaData('120117_BLAH_JSHJHXXX','bcl2fastq') >>> stats = FastqStatistics(data) >>> stats.report_basic_stats('basic_stats.out')
- property lane_names
Return list of lane names (e.g. [‘L1’,’L2’,…])
- property raw
Return the ‘raw’ statistics TabFile instance
- report_basic_stats(out_file=None, fp=None)
Report the ‘basic’ statistics
For each FASTQ file, report the following information:
Project name
Sample name
FASTQ file name (without leading directory)
Size (human-readable)
Nreads (number of reads)
Paired_end (‘Y’ for paired-end, ‘N’ for single-end)
- Parameters:
out_file (str) – name of file to write report to (used if ‘fp’ is not supplied)
fp (File) – File-like object open for writing (defaults to stdout if ‘out_file’ also not supplied)
- report_full_stats(out_file=None, fp=None)
Report all statistics gathered for all FASTQs
Essentially a dump of all the data.
- Parameters:
out_file (str) – name of file to write report to (used if ‘fp’ is not supplied)
fp (File) – File-like object open for writing (defaults to stdout if ‘out_file’ also not supplied)
- report_per_lane_sample_stats(out_file=None, fp=None, samplesheet=None)
Report of reads per sample in each lane
Reports the number of reads for each sample in each lane plus the total reads for each lane.
Example output:
Lane 1 Total reads = 182851745 - KatyDobbs/KD-K1 79888058 43.7% - KatyDobbs/KD-K3 97854292 53.5% - Undetermined_indices/lane1 5109395 2.8% …
- Parameters:
out_file (str) – name of file to write report to (used if ‘fp’ is not supplied)
fp (File) – File-like object open for writing (defaults to stdout if ‘out_file’ also not supplied)
samplesheet (str) – optional sample sheet file to get additional data from
- report_per_lane_summary_stats(out_file=None, fp=None)
Report summary of total and unassigned reads per-lane
- Parameters:
out_file (str) – name of file to write report to (used if ‘fp’ is not supplied)
fp (File) – File-like object open for writing (defaults to stdout if ‘out_file’ also not supplied)
- class auto_process_ngs.stats.FastqStats(fastq, project, sample)
Container for storing data about a FASTQ file
This is a convenience wrapper for holding together data for a FASTQ file (full path, associated project and sample names, number of reads and filesize).
- property lanes
Lane numbers associated with the FASTQ file
- property name
FASTQ file name without leading directory
- property read_number
Read number extracted from the FASTQ name
- auto_process_ngs.stats.collect_fastq_data(fqstats)
Collect data from FASTQ file in a FastqStats instance
Given a FastqStats instance, collects and sets the following properties derived from the corresponding FASTQ file stored in that instance:
nreads: total number of reads
fsize: file size
reads_by_lane: (R1 FASTQs only) dictionary where keys are lane numbers and values are read counts
Note that if the FASTQ file is an R2 (or higher) file then the reads per lane will not be set.
- Parameters:
fqstats (FastqStats) – FastqStats instance
- Returns:
- input FastqStats instance with the
appropriated properties updated.
- Return type: