auto_process_ngs.qc.plots
- class auto_process_ngs.qc.plots.Plot(width, height, bgcolor='white')
Utility class for creating pixel-based microplots
Provides methods for building up plots out of smaller elements (lines, blocks, single points and sets of points). It also provides methods for adding bounding boxes and background striping.
Usage:
>>> p = Plot(25,25) >>> p.bbox(RGB.grey) >>> p.plot({ x:x for x in range(25)},RGB.red) >>> p.save("example.png") >>> p.encoded_png() 'data:shjdhbcchb...'
- Parameters:
width (int) – width of the plot canvas (pixels)
height (int) – height of the plot canvas (pixels)
bgcolor (str) – background color (default: ‘white’)
- bar(data, xy1, xy2, colors)
Draw a horizontal stacked barplot
The bar is defined by corners xy1 and xy2 (cf the ‘block’ method).
- Parameters:
data (sequence) – list of values corresponding to the size of each section in the ‘stack’
xy1 (tuple) – pair of (x,y) positions defining first corner of the bar
xy2 (tuple) – pair of (x,y) positions defining second corner of the bar
colors (sequence) – to assign to each section in the ‘stack’; if the number of sections exceeds the number of colours then colours are reused in order from the beginning
- bbox(color)
Draw a single-pixel bounding box
- Parameters:
color (tuple) – tuple specifying the border RGB color
- block(xy1, xy2, color, color2=None)
Draw a rectangular block
The block is defined by corners xy1 and xy2.
- Parameters:
xy1 (tuple) – pair of (x,y) positions defining first corner of the block
xy2 (tuple) – pair of (x,y) positions defining second corner of the block
color (tuple) – tuple specifying an RGB color to fill the block with
- encoded_png()
Return plot PNG as base64 encoded string
- hline(y, color)
Draw a horizontal line
- Argument:
y (int): y-axis position of line color (tuple): tuple specifying an RGB color
- normalise_data(data, maxval=None)
Scales an arbitrary dataset along y-axis
- Parameters:
data (mapping) – set of x-axis positions mapping to corresponding y-axis positions
- Returns:
- copy of the dataset with normalised
y-values
- Return type:
Mapping
- plot(data, color, fill=False, interpolation='mean')
Plot arbitrary data along the x-axis
If data limits exceed the canvas size then the data will be scaled in both x and y directions to fit into the plot canvas, with the ‘interpolation’ argument specifying the method to use for handling data from multiple points which are combined into a single bin:
‘mean’ (the default) assigns the mean value of all the points that are put into the same bin (resulting in a single point plotted for each bin)
‘minmax’ plots a vertical line for each bin based on the minimum and maximum values assigned to the bin
(NB the ‘minmax’ interpolation can smooth out discontinuities in the plotted data that can appear when using the ‘mean’ method with datasets which are much wider than the plot width.)
Setting the ‘fill’ argument to True when using the ‘mean’ interpolation method fills the area under each plotted point; it is ignored when using ‘minmax’.
- Parameters:
data (mapping) – set of x-axis positions mapping to corresponding y-axis positions
color (tuple) – tuple specifying an RGB color
fill (bool) – if True then also fill the points under each point (default: no fill)
interpolation (str) – interpolation mode to use when plotting rebinned data (can be one of ‘mean’ or ‘minmax’)
- plot_range(data1, data2, color)
Plot two sets of data and fill the region inbetween
If data limits exceed the canvas size then the data will be scaled in both x and y directions to fit into the plot canvas.
- Parameters:
data1 (mapping) – set of x-axis positions mapping to corresponding y-axis positions for first dataset
data2 (mapping) – set of x-axis positions mapping to corresponding y-axis positions for second dataset
color (tuple) – tuple specifying an RGB color
- rebin_data(data)
Scales an arbitrary dataset along x-axis
Scales the supplied dataset so that the x-axis data fits into the plot size, by treating the plot x-values as ‘bins’ and combining data points from the dataset that fall into the same bin.
- Parameters:
data (mapping) – set of x-axis positions mapping to corresponding y-axis positions
- Returns:
- named tuple with three datasets
referenced by the keys ‘mean’ (mean values), ‘min’ (minimum values) and ‘max’ (maximum values) for each position.
- Return type:
NamedTuple
- save(f, ext='.png')
Save a copy of the plot to file
- Parameters:
f (str) – path to output file
ext (str) – optional, specify the plot extension (defaults to ‘.png’)
- set_pixel(x, y, color)
Set the color of a single pixel
- Parameters:
x (int) – x-position of pixel
y (int) – y-position of pixel
color (tuple) – tuple specifying an RGB color
- stripe(color1, color2)
Fill the plot with vertical stripes
- Parameters:
color1 (tuple) – tuple specifying first RGB color
color2 (tuple) – tuple specifying second RGB color
- vline(x, color, llen=None)
Draw a vertical line
- Argument:
y (int): y-axis position of line color (tuple): tuple specifying an RGB color llen (int): length of the line (defaults to
plot canvas height)
- auto_process_ngs.qc.plots.encode_png(png_file)
Return Base64 encoded string for a PNG
- auto_process_ngs.qc.plots.make_plot(img, outfile=None, inline=False, ext='.plot.png')
Internal: output PNG plots from Image objects
- Parameters:
img (Image) – image to output plot for
outfile (str) – path to output file to write PNG to (if None then no file will be created)
inline (bool) – if True then return base64 encoded string for the plot
ext (str) – optional, extension to use for temporary file
- auto_process_ngs.qc.plots.uadapterplot(adapter_content, adapter_names=None, outfile=None, inline=False, height=40, bar_width=10, spacing=2, multi_bar=False)
Make a ‘micro’ plot summarising adapter content
The plot consists of vertical bar(s) which indicate the relative presence of each adapter class in the sequence data.
In ‘multi-bar’ mode the plot has one bar for each adapter class; in ‘single-bar’ mode the plot combines all data into a single bar.
The adapter content should be supplied as a dictionary where the keys are adapter names and the corresponding adapter content is expressed as a decimal fraction.
- Parameters:
adapter_content (mapping) – dictionary mapping adapter names to adapter content
adapter_names (list) – optional, list of adapter classes; if provided then defines the order in which the adapters appear in the plot (otherwise taken from the keys in the mapping)
outfile (str) – path for the output PNG
inline (bool) – if True then returns the PNG as base64 encoded string rather than as a file
height (int) – height of the plot in pixels
bar_width (int) – width of each bar representing content for an adapter class, in pixels
spacing (int) – spacing between each bar, in pixels
multi_bar (bool) – if True then make a multi-bar plot (one bar per adapter class); otherwise make a single bar plot (all adapter data in a single bar)
- auto_process_ngs.qc.plots.uboxplot(fastqc_data=None, fastq=None, max_width=None, outfile=None, inline=None)
Generate FASTQ per-base quality ‘micro-boxplot’
‘Micro-boxplot’ is a thumbnail version of the per-base quality boxplots for a FASTQ file.
- Parameters:
fastqc_data (str) – path to a
fastqc_data.txt
filefastq (str) – path to a FASTQ file (quality stats will be extracted directly if
fastqc_data
is not supplied)max_width (int) – maximum width of plot in pixels; if
None
then by default width will be the number of bases, otherwise plots that would exceed this width will be scaled to fitoutfile (str) – path to output file
inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file
- Returns:
path to output PNG file
- Return type:
String
- auto_process_ngs.qc.plots.ucoverageprofileplot(data, outfile=None, inline=False)
Return a mini-plot of the Qualimap gene body coverage profile
- Parameters:
data (dict) – dictionary mapping transcript positions (percentile) to associated mean coverage depth (from Qualimap’s RNA-seq analysis)
outfile (str) – path for the output PNG
inline (bool) – if True then returns the PNG as base64 encoded string rather than as a file
- auto_process_ngs.qc.plots.uduplicationplot(total_deduplicated_percentage, height=None, width=None, mode='dup', style='fancy', warn_cutoff=None, fail_cutoff=None, outfile=None, inline=False)
Make a ‘micro’ plot summarising sequence duplication
Given the percentage of reads after deduplication (as calculated by FastQC’s “Sequence Duplication Levels” module), plots a horizontal bar indicating the level of sequence (de)duplication.
Two modes are available:
‘dup’ shows the fraction of sequences removed after deduplication
‘dedup’ shows the fraction of sequences remaining after deduplication
Two styles are supported:
‘fancy’ produces a multi-colour plot where the fraction of unique sequences is coloured according to pass, warn or fail cut-off levels, with the background to the bar striped with colours according to the cut-offs
‘simple’ produces a two-colour plot where the fraction of unique sequences is shown in blue and the remainder in red
- Parameters:
total_deduplicated_percentage (float) – percentage of sequences remaining after deduplication
height (int) – height of the plot in pixels
width (int) – width of the plot in pixels
style (str) – either ‘fancy’ (default) or ‘simple’
mode (str) – either ‘dup’ (default) or ‘dedup’
warn_cutoff (float) – fraction of unique sequences below which the plot should indicate a warning
fail_cutoff (float) – fraction of unique sequences below which the plot should indicate a failure
outfile (str) – path for the output PNG
inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file
- auto_process_ngs.qc.plots.ufastqcplot(summary_file, outfile=None, inline=False)
Make a ‘micro’ summary plot of FastQC output
The micro plot is a small PNG which represents the summary results from each FastQC module in a matrix, with rows representing the modules and three columns representing the status (‘PASS’, ‘WARN’ and ‘FAIL’, from left to right).
For example (in text form):
==
==
==
==
indicates that the status of the first module is ‘FAIL’, the 2nd, 3rd and 5th are ‘PASS’, and the 4th is ‘WARN’.
- param summary_file:
path to a FastQC ‘summary.txt’ output file
- type summary_file:
str
- param outfile:
path for the output PNG
- type outfile:
str
- param inline:
if True then returns the PNG as base64 encoded string rather than as a file
- type inline:
boolean
- auto_process_ngs.qc.plots.ugenomicoriginplot(data, width=100, height=40, outfile=None, inline=False)
Return a mini barplot of the Qualimap genomic origin of reads
- Parameters:
data (dict) – dictionary mapping genomic origin names to the associated percentage of reads (from Qualimap rnaseq ‘Genomic Origin of Reads’)
height (int) – plot height in pixels
width (int) – plot width in pixels
outfile (str) – path for the output PNG
inline (bool) – if True then returns the PNG as base64 encoded string rather than as a file
- auto_process_ngs.qc.plots.uinsertsizeplot(data, outfile=None, inline=False)
Return a mini-plot with the Picard insert size histogram
- Parameters:
data (dict) – dictionary mapping insert sizes to associated number of alignments (from Picard CollectInsertSizeMetrics)
outfile (str) – path for the output PNG
inline (bool) – if True then returns the PNG as base64 encoded string rather than as a file
- auto_process_ngs.qc.plots.ureadcountplot(nreads, nmasked=None, npadded=None, max_reads=None, outfile=None, inline=False, width=50, height=12, bg_color='white', fg_color='green', masked_color='red', padded_color='orange', fill_color='lightgrey')
Make a ‘micro’ plot summarising read counts and masking
Given a total number of reads/sequences in a Fastq file plus the number of those sequences which are masked (i.e. completely composed of Ns) and padded (i.e. have one or more trailing Ns), plots a horizontal bar indicating the sequence composition.
By default the numbers are normalised so that the total number of reads fills the bar; however if a maximum read count is also supplied then the normalisation is relative to that maximum (so the plot also indicates the relative size of the Fastq compared to the maximum read count).
- Parameters:
nreads (int) – number of reads in the Fastq
nmasked (int) – number of masked reads
npadded (int) – number of padded reads
max_reads (int) – maximum number of reads (e.g. in all Fastqs) for normalisation
outfile (str) – path for the output PNG
inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file
width (int) – width of the plot in pixels
height (int) – height of the plot in pixels
bg_color (str) – name of colour to use for the background
fg_color (str) – name of the colour for plotting unmasked, unpadded reads
masked_color (str) – name of the colour for plotting masked read fraction
padded_color (str) – name of the colour for plotting padded read fraction
fill_color (str) – name of the colour for filling the unoccupied remainder of the bar
- auto_process_ngs.qc.plots.uscreenplot(screen_files, outfile=None, screen_width=None, inline=None)
Generate ‘micro-plot’ of FastqScreen outputs
- Parameters:
screen_files (list) – list of paths to one or more …screen.txt files from FastqScreen
outfile (str) – path to output file
screen_width (int) – optional, set the width for each screen plot
inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file
- auto_process_ngs.qc.plots.useqlenplot(dist, masked_dist=None, min_len=None, max_len=None, outfile=None, inline=False, height=None, bg_color='gainsboro', bbox_color='white', seq_color='black', masked_color='red')
Make a ‘micro’ plot of sequence length
Given a sequence length distribution, create a histogram-style plot where the numbers of sequences with different lengths are shown (similar to the ‘Sequence Length Distribution’ plot from FastQC).
Optionally if a distribution of masked reads is also supplied then these data will be overlayed on top.
The distributions should be supplied as dictionaries or mappings where the keys are sequence lengths and the corresponding values are the number of sequences.
- Parameters:
dist (mapping) – mapping of sequence lengths to numbers of sequences, giving the distribution of sequence lengths
masked_dist (mapping) – optional, mapping of sequence lengths to numbers of masked reads
min_len (int) – optional, set the lower limit of the plot (otherwise defaults to the lowest length present in the distribution)
max_len (int) – optional, set the upper limit of the plot (otherwise defaults to the highest length present in the distribution)
outfile (str) – path for the output PNG
inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file
height (int) – height of the plot in pixels
bg_color (str) – name of colour to use for the background
bbox_color (str) – name of colour to use for the bounding box
seq_color (str) – name of colour to use for the sequence distribution
masked_color (str) – name of color to use for masked sequence distribution
- auto_process_ngs.qc.plots.ustackedbar(data, outfile=None, inline=False, bbox=True, height=20, length=100, colors=None)
Make a ‘micro’ stacked bar chart
A ‘stacked’ bar consists of a bar divided into sections, with each section of proportional length to the corresponding value.
- Parameters:
data (List) – list or tuple of data values
outfile (str) – path for the output PNG
inline (boolean) – if True then returns the PNG as base64 encoded string rather than as a file
bbox (boolean) – if True then draw a bounding box around the plot
height (int) – height of the bar in pixels
length (int) – length of the bar in pixels
colors (List) – list or tuple of color values
- auto_process_ngs.qc.plots.ustrandplot(fastq_strand_out, outfile=None, inline=False, height=25, width=50, fg_color=None, dynamic=False)
Make a ‘micro’ chart for strandedness
This micro plot is a small PNG which summarises the results from fastq_strand.py as two horizontal bars (one for forward, one for reverse) with the lengths representing the psuedo-percentages of each.
For example (in text form):
=
If the fastq_strand results included multiple genomes then there will be one pair of bars for each genome.
- param fastq_strand_out:
path to a fastq_strand output file
- type fastq_strand_out:
str
- param outfile:
path for the output PNG
- type outfile:
str
- param inline:
if True then returns the PNG as base64 encoded string rather than as a file
- type inline:
boolean
- param height:
height of the plot in pixels
- type height:
int
- param width:
width of the plot in pixels
- type width:
int
- param fg_color:
tuple of RGB values to use for the foreground colour of the bars
- type fg_color:
tuple
- param dynamic:
if True then the height of the plot will be increased for each additional genome in the output fastq_strand file
- type dynamic:
boolean