auto_process_ngs.qc.protocols
QC protocol definitions and utility functions for handling protocols and modules for QC of analysis projects.
QC protocols are defined with the QC_PROTOCOLS dictionary, where
each key is a protocol name and the corresponding values are
dictionaries which specify the reads used for sequence data and for
index information, along with a list of QC modules that comprise
the protocol.
For example:
"ExampleProtocol": {
"description": "Example QC protocol"
"reads": { "seq_data": ('r1','r3'), "index": ('r2') },
"qc_modules": ['fastqc','fastq_screen','sequence_lengths']
}
The available QC modules are those supported by the QCVerifier
class in the check_outputs method; new modules must be added
there before they can be specified in protocol definitions.
Optional modifiers can also be added to QC module specifications,
using the format NAME(KEY=VALUE;...).
For example:
cellranger_count(cellranger_version=*;cellranger_refdata=*)
The available modifiers are the same as the parameter list for the
check_outputs in the QCVerifier class.
This module also provides the following classes and functions:
QCProtocol: class representing a QC protocol
determine_qc_protocol_from_metadata: determine built-in QC protocol
determine_qc_protocol: determine built-in protocol for a project
fetch_protocol_definition: get the definition for a QC protocol
parse_protocol_repr: get a QCProtocol object from a string
parse_qc_module_spec: process QC module specification string
- class auto_process_ngs.qc.protocols.QCProtocol(name, description, seq_data_reads, index_reads, qc_modules)
Class defining a QC protocol
Properties:
name: protocol name
description: text description
reads: AttributeDictionary with elements ‘seq_data’, ‘index’, and ‘qc’ (listing sequence data, index reads, and all reads for QC, respectively)
read_numbers: AttributeDictionary with the same elements as ‘reads’, listing non-index read numbers
read_range: AttributeDictionary with normalised read names as elements and range of bases (as a tuple) as the values
qc_modules: list of QC module definitions
Reads are supplied as ‘r1’, ‘i2’ etc; read numbers are integers without the leading ‘r’ (NB index reads are not included).
Read ranges define subsequences within each read which contain the biologically significant data, and can be appended to the supplied reads using the syntax:
READ[:[START]-[END]]
For example: ‘r1:1-50’. These can be accessed via the ‘read_range’ property as e.g. ‘read_range.r1’ and will be returned as either ‘None’ (if no range was supplied), or a tuple ‘(START,END)’ (where either ‘START’ or ‘END’ will be ‘None’ if no limit was supplied).
The ‘seq_data_reads’ and ‘index_reads’ properties store the original read specifications.
QCProtocol instances can also be created directly from protocol specification strings using the ‘from_specification’ class method. (Specification strings are returned from ‘repr’ on existing QCProtocol instances, or can alternatively be constructed manually.)
- Parameters:
name (str) – name of the protocol
description (str) – protocol description
seq_data_reads (list) – read names associated with sequence data
index_reads (list) – read names associated with index data
qc_modules (list) – list of names of associated QC modules
- property expected_outputs
Return a list of the expected QC outputs
The expected outputs are based on the QC modules plus the reads associated with the protocol.
- classmethod from_specification(s)
Create new QCProtocol instance from specification
Given a specification string (such as that returned by ‘repr(…)’), create a new QCProtocol instance initialised from that specification.
Example usage:
>>> p = QCProtocol.from_specification("custom:...")
- Parameters:
s (str) – QC protocol specification string
- property qc_module_names
Return list of QC module names without parameters
Returns a list of the QC modules associated with the protocol, with any parameter lists (i.e. trailing ‘(…)’) removed.
For example the QC module list:
[‘cellranger(use_10x_multi_config=true)’,’fastqc’]
will be returned as:
[‘cellranger’,’fastqc’]
- summarise()
Summarise protocol
Generate plain-text description of the protocol
- update(seq_data_reads=None, index_reads=None, qc_modules=None)
Update the reads and QC modules for the protocol
Allows the sequence data reads, index reads and QC modules associated with the protocol to be updated. Checks that values are valid and that internal data is also correctly updated.
- Parameters:
seq_data_reads (list) – read names associated with sequence data (if not supplied then existing read data will be kept)
index_reads (list) – read names associated with index data (if not supplied then existing read data will be kept)
qc_modules (list) – list of names of associated QC modules (if not supplied then existing modules data will be kept)
- exception auto_process_ngs.qc.protocols.QCProtocolError
Base class for QC protocol-specific exceptions
- exception auto_process_ngs.qc.protocols.QCProtocolParseSpecError(message=None)
Exception raised when a protocol specificaion can’t be parsed
- Parameters:
message (str) – error message
- auto_process_ngs.qc.protocols.determine_qc_protocol(project)
Determine the QC protocol for a project
- Parameters:
project (AnalysisProject) – project instance
- Returns:
QC protocol for the project
- Return type:
String
- auto_process_ngs.qc.protocols.determine_qc_protocol_from_metadata(library_type, single_cell_platform, paired_end)
Determine the QC protocol from metadata values
- Parameters:
library_type (str) – library or application
single_cell_platform (str) – single cell platform (or None)
paired_end (bool) – whether data are paired end
- Returns:
QC protocol for the project
- Return type:
String
- auto_process_ngs.qc.protocols.fetch_protocol_definition(p)
Return the definition for a QC protocol
Fetches a QCProtocol instance for the supplied protocol, which can either be a protocol definition string or the name of a built-in QC protocol.
- Parameters:
p (str) – name or specification for the QC protocol
- Returns:
- QCProtocol object representing
the requested protocol.
- Return type:
- Raises:
KeyError – when a QCProtocol instance can’t be returned for the requested protocol.
- auto_process_ngs.qc.protocols.parse_protocol_spec(s)
Parse QC protocol specification string
Parses a QC protocol specification string (such as one returned by the ‘__repr__’ built-in of an existing QCProtocol instance) and returns an AttributeDictionary with the following elements extracted from the specification:
name
description
seq_reads
index_reads
qc_modules
These can then be used to create a new QCProtocol instance which matches the specification using e.g.
>>> p = QCProtocol(**parse_protocol_spec("..."))
- Parameters:
s (string) – QC protocol specification string
- Returns:
- AttributeDictionary with
keys mapped to values from the supplied specification.
- Return type:
AttributeDictionary
- Raises:
QCProtocolParseSpecError – if the specification string cannot be parsed correctly.
- auto_process_ngs.qc.protocols.parse_qc_module_spec(module_spec)
Parse QC module spec into name and parameters
Parse a QC module specification of the form
NAMEorNAME(KEY=VALUE;...)and return the module name and any additional parameters in the form of a dictionary.For example:
>>> parse_qc_module_spec('NAME') ('NAME', {}) >>> parse_qc_module_spec('NAME(K1=V1;K2=V2)') ('NAME', { 'K1':'V1', 'K2':'V2' })
By default values are returned as strings (with surrounding single or double quotes removed); however basic type conversion is also applied to certain values:
True/true and False/false are returned as the appropriate boolean value
- Parameters:
module_spec (str) – QC module specification
- Returns:
- tuple of the form (name,params) where
’name’ is the QC module name and ‘params’ is a dictionary with the extracted key-value pairs.
- Return type:
Tuple