auto_process_ngs.qc.protocols
QC protocol definitions and utility functions for handling protocols and modules for QC of analysis projects.
QC protocols are defined with the QC_PROTOCOLS
dictionary, where
each key is a protocol name and the corresponding values are
dictionaries which specify the reads used for sequence data and for
index information, along with a list of QC modules that comprise
the protocol.
For example:
"ExampleProtocol": {
"description": "Example QC protocol"
"reads": { "seq_data": ('r1','r3'), "index": ('r2') },
"qc_modules": ['fastqc','fastq_screen','sequence_lengths']
}
The available QC modules are those supported by the QCVerifier
class in the check_outputs
method; new modules must be added
there before they can be specified in protocol definitions.
Optional modifiers can also be added to QC module specifications,
using the format NAME(KEY=VALUE;...)
.
For example:
cellranger_count(cellranger_version=*;cellranger_refdata=*)
The available modifiers are the same as the parameter list for the
check_outputs
in the QCVerifier
class.
This module also provides the following classes and functions:
QCProtocol: class representing a QC protocol
determine_qc_protocol: determine built-in protocol for a project
fetch_protocol_definition: get the definition for a QC protocol
parse_protocol_repr: get a QCProtocol object from a string
- class auto_process_ngs.qc.protocols.QCProtocol(name, description, seq_data_reads, index_reads, qc_modules)
Class defining a QC protocol
Properties:
name: protocol name
description: text description
reads: AttributeDictionary with elements ‘seq_data’, ‘index’, and ‘qc’ (listing sequence data, index reads, and all reads for QC, respectively)
read_numbers: AttributeDictionary with the same elements as ‘reads’, listing non-index read numbers
read_range: AttributeDictionary with normalised read names as elements and range of bases (as a tuple) as the values
qc_modules: list of QC module definitions
Reads are supplied as ‘r1’, ‘i2’ etc; read numbers are integers without the leading ‘r’ (NB index reads are not included).
Read ranges define subsequences within each read which contain the biologically significant data, and can be appended to the supplied reads using the syntax:
READ[:[START]-[END]]
For example: ‘r1:1-50’. These can be accessed via the ‘read_range’ property as e.g. ‘read_range.r1’ and will be returned as either ‘None’ (if no range was supplied), or a tuple ‘(START,END)’ (where either ‘START’ or ‘END’ will be ‘None’ if no limit was supplied).
The ‘seq_data_reads’ and ‘index_reads’ properties store the original read specifications.
QCProtocol instances can also be created directly from protocol specification strings using the ‘from_specification’ class method. (Specification strings are returned from ‘repr’ on existing QCProtocol instances, or can alternatively be constructed manually.)
- Parameters:
name (str) – name of the protocol
description (str) – protocol description
seq_data_reads (list) – read names associated with sequence data
index_reads (list) – read names associated with index data
qc_modules (list) – list of names of associated QC modules
- classmethod from_specification(s)
Create new QCProtocol instance from specification
Given a specification string (such as that returned by ‘repr(…)’), create a new QCProtocol instance initialised from that specification.
Example usage:
>>> p = QCProtocol.from_specification("custom:...")
- Parameters:
s (str) – QC protocol specification string
- summarise()
Summarise protocol
Generate plain-text description of the protocol
- exception auto_process_ngs.qc.protocols.QCProtocolError
Base class for QC protocol-specific exceptions
- exception auto_process_ngs.qc.protocols.QCProtocolParseSpecError(message=None)
Exception raised when a protocol specificaion can’t be parsed
- Parameters:
message (str) – error message
- auto_process_ngs.qc.protocols.determine_qc_protocol(project)
Determine the QC protocol for a project
- Parameters:
project (AnalysisProject) – project instance
- Returns:
QC protocol for the project
- Return type:
String
- auto_process_ngs.qc.protocols.fetch_protocol_definition(p)
Return the definition for a QC protocol
Fetches a QCProtocol instance for the supplied protocol, which can either be a protocol definition string or the name of a built-in QC protocol.
- Parameters:
p (str) – name or specification for the QC protocol
- Returns:
- QCProtocol object representing
the requested protocol.
- Return type:
- Raises:
KeyError – when a QCProtocol instance can’t be returned for the requested protocol.
- auto_process_ngs.qc.protocols.parse_protocol_spec(s)
Parse QC protocol specification string
Parses a QC protocol specification string (such as one returned by the ‘__repr__’ built-in of an existing QCProtocol instance) and returns an AttributeDictionary with the following elements extracted from the specification:
name
description
seq_reads
index_reads
qc_modules
These can then be used to create a new QCProtocol instance which matches the specification using e.g.
>>> p = QCProtocol(**parse_protocol_spec("..."))
- Parameters:
s (string) – QC protocol specification string
- Returns:
- AttributeDictionary with
keys mapped to values from the supplied specification.
- Return type:
AttributeDictionary
- Raises:
QCProtocolParseSpecError – if the specification string cannot be parsed correctly.