
class auto_process_ngs.qc.fastqc.Fastqc(fastqc_dir)

Wrapper class for handling outputs from FastQC

The Fastqc object gives access to various aspects of the outputs of the FastQC program.

property data

Return a FastqcData instance

property dir

Path to the directory with the FastQC outputs

property html_report

Path to the associated HTML report file

plot(module, inline=False)
property summary

Return a FastqcSummary instance

property version

Version of FastQC that was used

property zip

Path to the associated ZIP archive

class auto_process_ngs.qc.fastqc.FastqcData(data_file)

Class representing data from a Fastqc data file

Reads in the data from a fastqc_data.txt file and makes it available programmatically.

To create a new FastqcData instance:

>>> fqc = FastqcData('fastqc_data.txt')

To access a field in the ‘Basic Statistics’ module:

>>> nreads = fqc.basic_statistics('Total Sequences')

Return summary data for adapter content

Summarises the amount of adapter present in a Fastq file based on data in the Adapter Content section, assigning a decimal fraction for each adapter class.

The fraction is calculated by summing the fraction of adapter across all bases, and then normalising by the number of bases.


mapping adapter names to the

fraction representing the amount of adapter present in the Fastq.

Return type:



Access a data item in the Basic Statistics section

Possible values include:

  • Filename

  • File type

  • Encoding

  • Total Sequences

  • Sequences flagged as poor quality

  • Sequence length

  • %GC


measure (str) – key corresponding to a ‘measure’ in the Basic Statistics section.


value of the requested ‘measure’

Return type:



KeyError – if ‘measure’ is not found.


Return the raw data for a module

Returns the data for the specified module as a list of lines.

The first list item/line is the header line; data items within each line are tab-delimited.

For example:

>>> Fastqc('myfastq_fastq')'Sequence Length Distribution')
['#Length       Count',
 '35    8826.0',
 '36    2848.0',
 '37    4666.0',
 '38    4524.0']
property modules

List of the modules in the raw data

property path

Path to the fastqc_data.txt file


Return sequence deduplication percentage

Returns the percentage of sequences remaining after deduplication according to FastQC.


percentage sequence deduplication

from FastQC.

Return type:


property version

FastQC version number

class auto_process_ngs.qc.fastqc.FastqcSummary(summary_file=None)

Class representing data from a Fastqc summary file

property failures

Return modules with failures

Returns a list with the names of the modules that have status ‘FAIL’.


Return the path of the HTML report from FastQC


Generate HTML table for FastQC summary


relpath (str) – optional, if supplied then links in the table will be relative to this path

Return link to the result of a specified FastQC module

  • name (str) – name of the module (e.g. ‘Basic Statistics’)

  • full_path (boolean) – optional, if True then return the full path; otherwise return just the anchor (e.g. ‘#M1’)

  • relpath (str) – optional, if supplied then specifies the path that full paths will be made relative to (implies full_path is True)

property passes

Return modules with passes

Returns a list with the names of the modules that have status ‘PASS’.

property warnings

Return modules with warnings

Returns a list with the names of the modules that have status ‘WARN’.

auto_process_ngs.qc.fastqc.logger = <Logger auto_process_ngs.qc.fastqc (WARNING)>

Example Fastqc summary text file (FASTQ_fastqc/summary.txt):

PASS Basic Statistics ES1_GTCCGC_L008_R1_001.fastq.gz PASS Per base sequence quality ES1_GTCCGC_L008_R1_001.fastq.gz PASS Per tile sequence quality ES1_GTCCGC_L008_R1_001.fastq.gz PASS Per sequence quality scores ES1_GTCCGC_L008_R1_001.fastq.gz FAIL Per base sequence content ES1_GTCCGC_L008_R1_001.fastq.gz WARN Per sequence GC content ES1_GTCCGC_L008_R1_001.fastq.gz PASS Per base N content ES1_GTCCGC_L008_R1_001.fastq.gz PASS Sequence Length Distribution ES1_GTCCGC_L008_R1_001.fastq.gz FAIL Sequence Duplication Levels ES1_GTCCGC_L008_R1_001.fastq.gz PASS Overrepresented sequences ES1_GTCCGC_L008_R1_001.fastq.gz PASS Adapter Content ES1_GTCCGC_L008_R1_001.fastq.gz FAIL Kmer Content ES1_GTCCGC_L008_R1_001.fastq.gz

Head of the FastQC data file (FASTQ_fastqc/fastqc_data.txt), which contains raw numbers for the plots etc):

##FastQC 0.11.3 >>Basic Statistics pass #Measure Value Filename ES1_GTCCGC_L008_R1_001.fastq.gz File type Conventional base calls Encoding Sanger / Illumina 1.9 Total Sequences 12317096 Sequences flagged as poor quality 0 Sequence length 101 %GC 50 >>END_MODULE >>Per base sequence quality pass #Base Mean Median Lower Quartile Upper Quartile 10th Percentile 90th Pe rcentile 1 32.80553403172306 33.0 33.0 33.0 33.0 33.0 …