auto_process_ngs.qc.plots

class auto_process_ngs.qc.plots.Plot(width, height, bgcolor='white')

Utility class for creating pixel-based microplots

Provides methods for building up plots out of smaller elements (lines, blocks, single points and sets of points). It also provides methods for adding bounding boxes and background striping.

Usage:

>>> p = Plot(25,25)
>>> p.bbox(RGB.grey)
>>> p.plot({ x:x for x in range(25)},RGB.red)
>>> p.save("example.png")
>>> p.encoded_png()
'data:shjdhbcchb...'
Parameters:
  • width (int) – width of the plot canvas (pixels)

  • height (int) – height of the plot canvas (pixels)

  • bgcolor (str) – background color (default: ‘white’)

bar(data, xy1, xy2, colors)

Draw a horizontal stacked barplot

The bar is defined by corners xy1 and xy2 (cf the ‘block’ method).

Parameters:
  • data (sequence) – list of values corresponding to the size of each section in the ‘stack’

  • xy1 (tuple) – pair of (x,y) positions defining first corner of the bar

  • xy2 (tuple) – pair of (x,y) positions defining second corner of the bar

  • colors (sequence) – to assign to each section in the ‘stack’; if the number of sections exceeds the number of colours then colours are reused in order from the beginning

bbox(color)

Draw a single-pixel bounding box

Parameters:

color (tuple) – tuple specifying the border RGB color

block(xy1, xy2, color, color2=None)

Draw a rectangular block

The block is defined by corners xy1 and xy2.

Parameters:
  • xy1 (tuple) – pair of (x,y) positions defining first corner of the block

  • xy2 (tuple) – pair of (x,y) positions defining second corner of the block

  • color (tuple) – tuple specifying an RGB color to fill the block with

encoded_png()

Return plot PNG as base64 encoded string

hline(y, color)

Draw a horizontal line

Argument:

y (int): y-axis position of line color (tuple): tuple specifying an RGB color

normalise_data(data, maxval=None)

Scales an arbitrary dataset along y-axis

Parameters:

data (mapping) – set of x-axis positions mapping to corresponding y-axis positions

Returns:

copy of the dataset with normalised

y-values

Return type:

Mapping

plot(data, color, fill=False, interpolation='mean')

Plot arbitrary data along the x-axis

If data limits exceed the canvas size then the data will be scaled in both x and y directions to fit into the plot canvas, with the ‘interpolation’ argument specifying the method to use for handling data from multiple points which are combined into a single bin:

  • ‘mean’ (the default) assigns the mean value of all the points that are put into the same bin (resulting in a single point plotted for each bin)

  • ‘minmax’ plots a vertical line for each bin based on the minimum and maximum values assigned to the bin

(NB the ‘minmax’ interpolation can smooth out discontinuities in the plotted data that can appear when using the ‘mean’ method with datasets which are much wider than the plot width.)

Setting the ‘fill’ argument to True when using the ‘mean’ interpolation method fills the area under each plotted point; it is ignored when using ‘minmax’.

Parameters:
  • data (mapping) – set of x-axis positions mapping to corresponding y-axis positions

  • color (tuple) – tuple specifying an RGB color

  • fill (bool) – if True then also fill the points under each point (default: no fill)

  • interpolation (str) – interpolation mode to use when plotting rebinned data (can be one of ‘mean’ or ‘minmax’)

plot_range(data1, data2, color)

Plot two sets of data and fill the region inbetween

If data limits exceed the canvas size then the data will be scaled in both x and y directions to fit into the plot canvas.

Parameters:
  • data1 (mapping) – set of x-axis positions mapping to corresponding y-axis positions for first dataset

  • data2 (mapping) – set of x-axis positions mapping to corresponding y-axis positions for second dataset

  • color (tuple) – tuple specifying an RGB color

rebin_data(data)

Scales an arbitrary dataset along x-axis

Scales the supplied dataset so that the x-axis data fits into the plot size, by treating the plot x-values as ‘bins’ and combining data points from the dataset that fall into the same bin.

Parameters:

data (mapping) – set of x-axis positions mapping to corresponding y-axis positions

Returns:

named tuple with three datasets

referenced by the keys ‘mean’ (mean values), ‘min’ (minimum values) and ‘max’ (maximum values) for each position.

Return type:

NamedTuple

save(f, ext='.png')

Save a copy of the plot to file

Parameters:
  • f (str) – path to output file

  • ext (str) – optional, specify the plot extension (defaults to ‘.png’)

set_pixel(x, y, color)

Set the color of a single pixel

Parameters:
  • x (int) – x-position of pixel

  • y (int) – y-position of pixel

  • color (tuple) – tuple specifying an RGB color

stripe(color1, color2)

Fill the plot with vertical stripes

Parameters:
  • color1 (tuple) – tuple specifying first RGB color

  • color2 (tuple) – tuple specifying second RGB color

vline(x, color, llen=None)

Draw a vertical line

Argument:

y (int): y-axis position of line color (tuple): tuple specifying an RGB color llen (int): length of the line (defaults to

plot canvas height)

auto_process_ngs.qc.plots.encode_png(png_file)

Return Base64 encoded string for a PNG

auto_process_ngs.qc.plots.make_plot(img, outfile=None, inline=False, ext='.plot.png')

Internal: output PNG plots from Image objects

Parameters:
  • img (Image) – image to output plot for

  • outfile (str) – path to output file to write PNG to (if None then no file will be created)

  • inline (bool) – if True then return base64 encoded string for the plot

  • ext (str) – optional, extension to use for temporary file

auto_process_ngs.qc.plots.uadapterplot(adapter_content, adapter_names=None, outfile=None, inline=False, height=40, bar_width=10, spacing=2, multi_bar=False)

Make a ‘micro’ plot summarising adapter content

The plot consists of vertical bar(s) which indicate the relative presence of each adapter class in the sequence data.

In ‘multi-bar’ mode the plot has one bar for each adapter class; in ‘single-bar’ mode the plot combines all data into a single bar.

The adapter content should be supplied as a dictionary where the keys are adapter names and the corresponding adapter content is expressed as a decimal fraction.

Parameters:
  • adapter_content (mapping) – dictionary mapping adapter names to adapter content

  • adapter_names (list) – optional, list of adapter classes; if provided then defines the order in which the adapters appear in the plot (otherwise taken from the keys in the mapping)

  • outfile (str) – path for the output PNG

  • inline (bool) – if True then returns the PNG as base64 encoded string rather than as a file

  • height (int) – height of the plot in pixels

  • bar_width (int) – width of each bar representing content for an adapter class, in pixels

  • spacing (int) – spacing between each bar, in pixels

  • multi_bar (bool) – if True then make a multi-bar plot (one bar per adapter class); otherwise make a single bar plot (all adapter data in a single bar)

auto_process_ngs.qc.plots.uboxplot(fastqc_data=None, fastq=None, max_width=None, outfile=None, inline=None)

Generate FASTQ per-base quality ‘micro-boxplot’

‘Micro-boxplot’ is a thumbnail version of the per-base quality boxplots for a FASTQ file.

Parameters:
  • fastqc_data (str) – path to a fastqc_data.txt file

  • fastq (str) – path to a FASTQ file (quality stats will be extracted directly if fastqc_data is not supplied)

  • max_width (int) – maximum width of plot in pixels; if None then by default width will be the number of bases, otherwise plots that would exceed this width will be scaled to fit

  • outfile (str) – path to output file

  • inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file

Returns:

path to output PNG file

Return type:

String

auto_process_ngs.qc.plots.ucoverageprofileplot(data, outfile=None, inline=False)

Return a mini-plot of the Qualimap gene body coverage profile

Parameters:
  • data (dict) – dictionary mapping transcript positions (percentile) to associated mean coverage depth (from Qualimap’s RNA-seq analysis)

  • outfile (str) – path for the output PNG

  • inline (bool) – if True then returns the PNG as base64 encoded string rather than as a file

auto_process_ngs.qc.plots.uduplicationplot(total_deduplicated_percentage, height=None, width=None, mode='dup', style='fancy', warn_cutoff=None, fail_cutoff=None, outfile=None, inline=False)

Make a ‘micro’ plot summarising sequence duplication

Given the percentage of reads after deduplication (as calculated by FastQC’s “Sequence Duplication Levels” module), plots a horizontal bar indicating the level of sequence (de)duplication.

Two modes are available:

  • ‘dup’ shows the fraction of sequences removed after deduplication

  • ‘dedup’ shows the fraction of sequences remaining after deduplication

Two styles are supported:

  • ‘fancy’ produces a multi-colour plot where the fraction of unique sequences is coloured according to pass, warn or fail cut-off levels, with the background to the bar striped with colours according to the cut-offs

  • ‘simple’ produces a two-colour plot where the fraction of unique sequences is shown in blue and the remainder in red

Parameters:
  • total_deduplicated_percentage (float) – percentage of sequences remaining after deduplication

  • height (int) – height of the plot in pixels

  • width (int) – width of the plot in pixels

  • style (str) – either ‘fancy’ (default) or ‘simple’

  • mode (str) – either ‘dup’ (default) or ‘dedup’

  • warn_cutoff (float) – fraction of unique sequences below which the plot should indicate a warning

  • fail_cutoff (float) – fraction of unique sequences below which the plot should indicate a failure

  • outfile (str) – path for the output PNG

  • inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file

auto_process_ngs.qc.plots.ufastqcplot(summary_file, outfile=None, inline=False)

Make a ‘micro’ summary plot of FastQC output

The micro plot is a small PNG which represents the summary results from each FastQC module in a matrix, with rows representing the modules and three columns representing the status (‘PASS’, ‘WARN’ and ‘FAIL’, from left to right).

For example (in text form):

==

==

==

==

indicates that the status of the first module is ‘FAIL’, the 2nd, 3rd and 5th are ‘PASS’, and the 4th is ‘WARN’.

param summary_file:

path to a FastQC ‘summary.txt’ output file

type summary_file:

str

param outfile:

path for the output PNG

type outfile:

str

param inline:

if True then returns the PNG as base64 encoded string rather than as a file

type inline:

boolean

auto_process_ngs.qc.plots.ugenomicoriginplot(data, width=100, height=40, outfile=None, inline=False)

Return a mini barplot of the Qualimap genomic origin of reads

Parameters:
  • data (dict) – dictionary mapping genomic origin names to the associated percentage of reads (from Qualimap rnaseq ‘Genomic Origin of Reads’)

  • height (int) – plot height in pixels

  • width (int) – plot width in pixels

  • outfile (str) – path for the output PNG

  • inline (bool) – if True then returns the PNG as base64 encoded string rather than as a file

auto_process_ngs.qc.plots.uinsertsizeplot(data, outfile=None, inline=False)

Return a mini-plot with the Picard insert size histogram

Parameters:
  • data (dict) – dictionary mapping insert sizes to associated number of alignments (from Picard CollectInsertSizeMetrics)

  • outfile (str) – path for the output PNG

  • inline (bool) – if True then returns the PNG as base64 encoded string rather than as a file

auto_process_ngs.qc.plots.ureadcountplot(nreads, nmasked=None, npadded=None, max_reads=None, outfile=None, inline=False, width=50, height=12, bg_color='white', fg_color='green', masked_color='red', padded_color='orange', fill_color='lightgrey')

Make a ‘micro’ plot summarising read counts and masking

Given a total number of reads/sequences in a Fastq file plus the number of those sequences which are masked (i.e. completely composed of Ns) and padded (i.e. have one or more trailing Ns), plots a horizontal bar indicating the sequence composition.

By default the numbers are normalised so that the total number of reads fills the bar; however if a maximum read count is also supplied then the normalisation is relative to that maximum (so the plot also indicates the relative size of the Fastq compared to the maximum read count).

Parameters:
  • nreads (int) – number of reads in the Fastq

  • nmasked (int) – number of masked reads

  • npadded (int) – number of padded reads

  • max_reads (int) – maximum number of reads (e.g. in all Fastqs) for normalisation

  • outfile (str) – path for the output PNG

  • inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file

  • width (int) – width of the plot in pixels

  • height (int) – height of the plot in pixels

  • bg_color (str) – name of colour to use for the background

  • fg_color (str) – name of the colour for plotting unmasked, unpadded reads

  • masked_color (str) – name of the colour for plotting masked read fraction

  • padded_color (str) – name of the colour for plotting padded read fraction

  • fill_color (str) – name of the colour for filling the unoccupied remainder of the bar

auto_process_ngs.qc.plots.uscreenplot(screen_files, outfile=None, screen_width=None, inline=None)

Generate ‘micro-plot’ of FastqScreen outputs

Parameters:
  • screen_files (list) – list of paths to one or more …screen.txt files from FastqScreen

  • outfile (str) – path to output file

  • screen_width (int) – optional, set the width for each screen plot

  • inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file

auto_process_ngs.qc.plots.useqlenplot(dist, masked_dist=None, min_len=None, max_len=None, outfile=None, inline=False, height=None, bg_color='gainsboro', bbox_color='white', seq_color='black', masked_color='red')

Make a ‘micro’ plot of sequence length

Given a sequence length distribution, create a histogram-style plot where the numbers of sequences with different lengths are shown (similar to the ‘Sequence Length Distribution’ plot from FastQC).

Optionally if a distribution of masked reads is also supplied then these data will be overlayed on top.

The distributions should be supplied as dictionaries or mappings where the keys are sequence lengths and the corresponding values are the number of sequences.

Parameters:
  • dist (mapping) – mapping of sequence lengths to numbers of sequences, giving the distribution of sequence lengths

  • masked_dist (mapping) – optional, mapping of sequence lengths to numbers of masked reads

  • min_len (int) – optional, set the lower limit of the plot (otherwise defaults to the lowest length present in the distribution)

  • max_len (int) – optional, set the upper limit of the plot (otherwise defaults to the highest length present in the distribution)

  • outfile (str) – path for the output PNG

  • inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file

  • height (int) – height of the plot in pixels

  • bg_color (str) – name of colour to use for the background

  • bbox_color (str) – name of colour to use for the bounding box

  • seq_color (str) – name of colour to use for the sequence distribution

  • masked_color (str) – name of color to use for masked sequence distribution

auto_process_ngs.qc.plots.ustackedbar(data, outfile=None, inline=False, bbox=True, height=20, length=100, colors=None)

Make a ‘micro’ stacked bar chart

A ‘stacked’ bar consists of a bar divided into sections, with each section of proportional length to the corresponding value.

Parameters:
  • data (List) – list or tuple of data values

  • outfile (str) – path for the output PNG

  • inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file

  • bbox (boolean) – if True then draw a bounding box around the plot

  • height (int) – height of the bar in pixels

  • length (int) – length of the bar in pixels

  • colors (List) – list or tuple of color values

auto_process_ngs.qc.plots.ustrandplot(fastq_strand_out, outfile=None, inline=False, height=25, width=50, fg_color=None, dynamic=False)

Make a ‘micro’ chart for strandedness

This micro plot is a small PNG which summarises the results from fastq_strand.py as two horizontal bars (one for forward, one for reverse) with the lengths representing the psuedo-percentages of each.

For example (in text form):

=

If the fastq_strand results included multiple genomes then there will be one pair of bars for each genome.

param fastq_strand_out:

path to a fastq_strand output file

type fastq_strand_out:

str

param outfile:

path for the output PNG

type outfile:

str

param inline:

if True then returns the PNG as base64 encoded string rather than as a file

type inline:

boolean

param height:

height of the plot in pixels

type height:

int

param width:

width of the plot in pixels

type width:

int

param fg_color:

tuple of RGB values to use for the foreground colour of the bars

type fg_color:

tuple

param dynamic:

if True then the height of the plot will be increased for each additional genome in the output fastq_strand file

type dynamic:

boolean