auto_process_ngs.applications

Collects information about sequencing applications (combinations of platform and library type).

Defines the APPLICATIONS list, which contains dictionaries defining known applications, and the identify_application() function, which can be used to identify the appropriate application for a given platform and library type.

Each application is defined as a dictionary with the following keys:

  • platforms: list of platform names (with optional wildcards) which correspond to the application; use [“*”] to indicate all platforms, or an empty list to indicate that the platform must not be set

  • libraries: list of library types (with optional wildcards) which correspond to the application; use [“*”] to indicate all library types, or an empty list to indicate that the library type must not be set

  • extensions: list of library type extensions which can be appended to the library type via the plus symbol (“+”)

  • alternative_extensions: dictionary of alternative names for library type extensions mapped to the “canonical” names in the ‘extensions’ list

  • fastq_generation: name of the Fastq generation protocol to use for this application

  • qc_protocol: name of the QC protocol to use for this application

  • setup: dictionary with information about actions that should be performed as part of setting up analysis project directories for this application; contains the following keys: - templates: list of template names to use for this application; can be

    one or more of: “10x_multi_config”, “10x_multiome_libraries”.

    • directories: list of subdirectory names which will be created in the analysis project directory (for example “Visium_images”)

  • assays: optional list of assays to associate with this application

  • tags: optional list of tags to associate with this application; tags can be one or more of: “10x”, “bio_rad”, “parse”, “single_cell”, “spatial”, “legacy”. Tags are used for automated documentation generation.

The minimum required keys for each application are platforms, libraries, fastq_generation, and qc_protocol.

The module also defines the following user-facing function:

  • identify_application: returns the dictionary defining the application

    which matches a given platform and library type

  • fetch_application_data: returns a list of application definitions

    matching specified tags

The following functions are also defined for internal use:

  • match_application: determines whether a given platform and library type match a given application definition

  • score_match: returns a score for a platform/library combination

auto_process_ngs.applications.fetch_application_data(tags, applications=None, expand=False)

Fetch application data matching specified tags

Parameters:
  • tags (list) – list of tags to match; tags starting with ‘!’ are treated as negative tags

  • applications (list) – list of application definitions to filter; if None then the default APPLICATIONS list is used

  • expand (bool) – if True then applications with multiple platforms/libraries are expanded so there is one entry per platform/library combination

Returns:

list of application definitions matching the specified tags.

Return type:

list

auto_process_ngs.applications.identify_application(platform_name, library_type)

Returns information about an application

Applications are combinations of platforms and libraries.

Parameters:
  • platform_name (str) – name of the platform

  • library_type (str) – name of the library

Returns:

application-specific information

Return type:

dict

auto_process_ngs.applications.match_application(application_info, platform_name, library_type)

Determine if platform and library type matches the supplied application

Given information about an application (supplied as a dictionary with elements platforms and libraries), determines whether the supplied platform and library match that information.

FIXME doesn’t currently include matching against the library extensions

Parameters:
  • application_info (dict) – information about the application

  • platform_name (str) – name of the platform

  • library_type (str) – name of the library

Returns:

if the platform and library match the application then returns a tuple of the form (application, list of platform matches, list of library matches); otherwise returns None

Return type:

tuple

auto_process_ngs.applications.score_match(platforms, libraries)

Return a score for a platform/library combination

The score is calculated as the sum of the minimum number of wildcard characters (‘*’) in the platform and library lists. A lower score indicates a more specific match.

Parameters:
  • platforms (list) – list of platforms

  • libraries (list) – list of libraries

auto_process_ngs.applications.split_library_type(library_type)

Splits a library type into its components

Library types are expected to consist of a “base” library type followed by none or more optional “extensions”, which are identified by a preceding ‘+’ character.

For example: “GEX” has a base library type with no extensions; “GEX+CSP” has the base type “GEX” with extensions [“CSP”]; and “GEX+CSP+VDJ” has the base type “GEX” with extensions [“CSP”, “VDJ”].

This function returns a tuple of the form:

(BASE, EXTENSIONS)

Parameters:

library_type (str) – name of the library

Returns:

tuple with two elements, first is the base library type, the second is a list of the extensions.

Return type:

tuple