auto_process_ngs.qc.protocols

QC protocol definitions and utility functions for handling protocols and modules for QC of analysis projects.

QC protocols are defined with the QC_PROTOCOLS dictionary, where each key is a protocol name and the corresponding values are dictionaries which specify the reads used for sequence data and for index information, along with a list of QC modules that comprise the protocol.

For example:

"ExampleProtocol": {
    "description": "Example QC protocol"
    "reads": { "seq_data": ('r1','r3'), "index": ('r2') },
    "qc_modules": ['fastqc','fastq_screen','sequence_lengths']
}

The available QC modules are those supported by the QCVerifier class in the check_outputs method; new modules must be added there before they can be specified in protocol definitions.

Optional modifiers can also be added to QC module specifications, using the format NAME(KEY=VALUE;...).

For example:

cellranger_count(cellranger_version=*;cellranger_refdata=*)

The available modifiers are the same as the parameter list for the check_outputs in the QCVerifier class.

This module also provides the following classes and functions:

  • QCProtocol: class representing a QC protocol

  • determine_qc_protocol: determine built-in protocol for a project

  • fetch_protocol_definition: get the definition for a QC protocol

  • parse_protocol_repr: get a QCProtocol object from a string

class auto_process_ngs.qc.protocols.QCProtocol(name, description, seq_data_reads, index_reads, qc_modules)

Class defining a QC protocol

Properties:

  • name: protocol name

  • description: text description

  • reads: AttributeDictionary with elements ‘seq_data’, ‘index’, and ‘qc’ (listing sequence data, index reads, and all reads for QC, respectively)

  • read_numbers: AttributeDictionary with the same elements as ‘reads’, listing non-index read numbers

  • read_range: AttributeDictionary with normalised read names as elements and range of bases (as a tuple) as the values

  • qc_modules: list of QC module definitions

Reads are supplied as ‘r1’, ‘i2’ etc; read numbers are integers without the leading ‘r’ (NB index reads are not included).

Read ranges define subsequences within each read which contain the biologically significant data, and can be appended to the supplied reads using the syntax:

READ[:[START]-[END]]

For example: ‘r1:1-50’. These can be accessed via the ‘read_range’ property as e.g. ‘read_range.r1’ and will be returned as either ‘None’ (if no range was supplied), or a tuple ‘(START,END)’ (where either ‘START’ or ‘END’ will be ‘None’ if no limit was supplied).

The ‘seq_data_reads’ and ‘index_reads’ properties store the original read specifications.

QCProtocol instances can also be created directly from protocol specification strings using the ‘from_specification’ class method. (Specification strings are returned from ‘repr’ on existing QCProtocol instances, or can alternatively be constructed manually.)

Parameters:
  • name (str) – name of the protocol

  • description (str) – protocol description

  • seq_data_reads (list) – read names associated with sequence data

  • index_reads (list) – read names associated with index data

  • qc_modules (list) – list of names of associated QC modules

classmethod from_specification(s)

Create new QCProtocol instance from specification

Given a specification string (such as that returned by ‘repr(…)’), create a new QCProtocol instance initialised from that specification.

Example usage:

>>> p = QCProtocol.from_specification("custom:...")
Parameters:

s (str) – QC protocol specification string

summarise()

Summarise protocol

Generate plain-text description of the protocol

exception auto_process_ngs.qc.protocols.QCProtocolError

Base class for QC protocol-specific exceptions

exception auto_process_ngs.qc.protocols.QCProtocolParseSpecError(message=None)

Exception raised when a protocol specificaion can’t be parsed

Parameters:

message (str) – error message

auto_process_ngs.qc.protocols.determine_qc_protocol(project)

Determine the QC protocol for a project

Parameters:

project (AnalysisProject) – project instance

Returns:

QC protocol for the project

Return type:

String

auto_process_ngs.qc.protocols.fetch_protocol_definition(p)

Return the definition for a QC protocol

Fetches a QCProtocol instance for the supplied protocol, which can either be a protocol definition string or the name of a built-in QC protocol.

Parameters:

p (str) – name or specification for the QC protocol

Returns:

QCProtocol object representing

the requested protocol.

Return type:

QCProtocol

Raises:

KeyError – when a QCProtocol instance can’t be returned for the requested protocol.

auto_process_ngs.qc.protocols.parse_protocol_spec(s)

Parse QC protocol specification string

Parses a QC protocol specification string (such as one returned by the ‘__repr__’ built-in of an existing QCProtocol instance) and returns an AttributeDictionary with the following elements extracted from the specification:

  • name

  • description

  • seq_reads

  • index_reads

  • qc_modules

These can then be used to create a new QCProtocol instance which matches the specification using e.g.

>>> p = QCProtocol(**parse_protocol_spec("..."))
Parameters:

s (string) – QC protocol specification string

Returns:

AttributeDictionary with

keys mapped to values from the supplied specification.

Return type:

AttributeDictionary

Raises:

QCProtocolParseSpecError – if the specification string cannot be parsed correctly.