auto_process_ngs.qc.fastqc
- class auto_process_ngs.qc.fastqc.Fastqc(fastqc_dir)
Wrapper class for handling outputs from FastQC
The
Fastqc
object gives access to various aspects of the outputs of the FastQC program.- adapter_content_plot(inline=False)
- property data
Return a FastqcData instance
- property dir
Path to the directory with the FastQC outputs
- property html_report
Path to the associated HTML report file
- plot(module, inline=False)
- quality_boxplot(inline=False)
- property summary
Return a FastqcSummary instance
- property version
Version of FastQC that was used
- property zip
Path to the associated ZIP archive
- class auto_process_ngs.qc.fastqc.FastqcData(data_file)
Class representing data from a Fastqc data file
Reads in the data from a
fastqc_data.txt
file and makes it available programmatically.To create a new FastqcData instance:
>>> fqc = FastqcData('fastqc_data.txt')
To access a field in the ‘Basic Statistics’ module:
>>> nreads = fqc.basic_statistics('Total Sequences')
- adapter_content_summary()
Return summary data for adapter content
Summarises the amount of adapter present in a Fastq file based on data in the
Adapter Content
section, assigning a decimal fraction for each adapter class.The fraction is calculated by summing the fraction of adapter across all bases, and then normalising by the number of bases.
- Returns:
- mapping adapter names to the
fraction representing the amount of adapter present in the Fastq.
- Return type:
OrderedDict
- basic_statistics(measure)
Access a data item in the
Basic Statistics
sectionPossible values include:
Filename
File type
Encoding
Total Sequences
Sequences flagged as poor quality
Sequence length
%GC
- Parameters:
measure (str) – key corresponding to a ‘measure’ in the
Basic Statistics
section.- Returns:
value of the requested ‘measure’
- Return type:
String
- Raises:
KeyError – if ‘measure’ is not found.
- data(module)
Return the raw data for a module
Returns the data for the specified module as a list of lines.
The first list item/line is the header line; data items within each line are tab-delimited.
For example:
>>> Fastqc('myfastq_fastq').data.data('Sequence Length Distribution') ['#Length Count', '35 8826.0', '36 2848.0', '37 4666.0', '38 4524.0']
- property modules
List of the modules in the raw data
- property path
Path to the fastqc_data.txt file
- sequence_deduplication_percentage()
Return sequence deduplication percentage
Returns the percentage of sequences remaining after deduplication according to FastQC.
- Returns:
- percentage sequence deduplication
from FastQC.
- Return type:
Float
- property version
FastQC version number
- class auto_process_ngs.qc.fastqc.FastqcSummary(summary_file=None)
Class representing data from a Fastqc summary file
- property failures
Return modules with failures
Returns a list with the names of the modules that have status ‘FAIL’.
- html_report()
Return the path of the HTML report from FastQC
- html_table(relpath=None)
Generate HTML table for FastQC summary
- Parameters:
relpath (str) – optional, if supplied then links in the table will be relative to this path
- link_to_module(name, full_path=True, relpath=None)
Return link to the result of a specified FastQC module
- Parameters:
name (str) – name of the module (e.g. ‘Basic Statistics’)
full_path (boolean) – optional, if True then return the full path; otherwise return just the anchor (e.g. ‘#M1’)
relpath (str) – optional, if supplied then specifies the path that full paths will be made relative to (implies full_path is True)
- property passes
Return modules with passes
Returns a list with the names of the modules that have status ‘PASS’.
- property warnings
Return modules with warnings
Returns a list with the names of the modules that have status ‘WARN’.
- auto_process_ngs.qc.fastqc.logger = <Logger auto_process_ngs.qc.fastqc (WARNING)>
Example Fastqc summary text file (FASTQ_fastqc/summary.txt):
PASS Basic Statistics ES1_GTCCGC_L008_R1_001.fastq.gz PASS Per base sequence quality ES1_GTCCGC_L008_R1_001.fastq.gz PASS Per tile sequence quality ES1_GTCCGC_L008_R1_001.fastq.gz PASS Per sequence quality scores ES1_GTCCGC_L008_R1_001.fastq.gz FAIL Per base sequence content ES1_GTCCGC_L008_R1_001.fastq.gz WARN Per sequence GC content ES1_GTCCGC_L008_R1_001.fastq.gz PASS Per base N content ES1_GTCCGC_L008_R1_001.fastq.gz PASS Sequence Length Distribution ES1_GTCCGC_L008_R1_001.fastq.gz FAIL Sequence Duplication Levels ES1_GTCCGC_L008_R1_001.fastq.gz PASS Overrepresented sequences ES1_GTCCGC_L008_R1_001.fastq.gz PASS Adapter Content ES1_GTCCGC_L008_R1_001.fastq.gz FAIL Kmer Content ES1_GTCCGC_L008_R1_001.fastq.gz
Head of the FastQC data file (FASTQ_fastqc/fastqc_data.txt), which contains raw numbers for the plots etc):
##FastQC 0.11.3 >>Basic Statistics pass #Measure Value Filename ES1_GTCCGC_L008_R1_001.fastq.gz File type Conventional base calls Encoding Sanger / Illumina 1.9 Total Sequences 12317096 Sequences flagged as poor quality 0 Sequence length 101 %GC 50 >>END_MODULE >>Per base sequence quality pass #Base Mean Median Lower Quartile Upper Quartile 10th Percentile 90th Pe rcentile 1 32.80553403172306 33.0 33.0 33.0 33.0 33.0 …