auto_process_ngs.auto_processor

class auto_process_ngs.auto_processor.AutoProcess(analysis_dir=None, settings=None, allow_save_params=True)

Class implementing an automatic fastq generation and QC processing procedure for Illumina sequencing data

check_metadata(items)

Check that metadata items are set

For metadata items supplied as an iterable in ‘items’, check that each is set to a non-null value. Report those that are null.

Return False if one or more are null; otherwise return True.

edit_readme()

Bring up README in an editor

get_analysis_projects(pattern=None)

Return the analysis projects in a list

By default returns all projects within the analysis directory which are listed in the ‘projects.info’ metadata file (and ‘undetermined’, which is not).

If the ‘pattern’ is not None then it should be a simple pattern used to match against available names to select a subset of projects (see bcf_utils.name_matches).

If any project in ‘projects.info’ doesn’t have an associated analysis directory then it will be omitted from the results.

Parameters:

pattern (str) – optional pattern to select a subset of projects (default: select all projects)

Returns:

list of AnalysisProject instances.

Return type:

List

get_analysis_projects_from_dirs(pattern=None, strict=False)

Return a list of AnalysisProjects in the analysis directory

Tests each of the subdirectories in the top-level of the analysis directory and rejects any that appear to be CASVAVA/bcl2fastq outputs or which don’t successfully load as AnalysisProject instances.

Unlike the get_analysis_projects method, no checking against the project metadata (typically in ‘projects.info’) is performed.

If the ‘pattern’ is not None then it should be a simple pattern used to match against available names to select a subset of projects (see bcf_utils.name_matches).

Parameters:
  • pattern (str) – optional pattern to select a subset of projects (default: select all projects)

  • strict (bool) – if True then apply strict checks on each discovered project directory before adding it to the list (default: don’t apply strict checks)

Returns:

list of AnalysisProject instances.

Return type:

List

get_log_subdir(name)

Return the name for a new log subdirectory

Subdirectories are named as NNN_<name> e.g. 001_setup, 002_make_fastqs etc

Parameters:

name (str) – name for the subdirectory (typically the name of the processing stage that will produce logs to be written to the subdirs

Returns:

name for the new log subdirectory

(nb not the full path).

Return type:

String

property has_parameter_file

Indicate if there is a parameter file (typically auto_process.info)

init_readme()

Create a new README file

load_metadata(allow_save=True)

Load metadata values from file

Parameters:

allow_save (boolean) – if True then allow metadata items to be saved back to the metadata file (the default); otherwise don’t allow save.

load_parameters(allow_save=True)

Load parameter values from file

Parameters:

allow_save (boolean) – if True then allow params to be saved back to the parameter file (the default); otherwise don’t allow save.

load_project_metadata(project_metadata_file=None)

Load data from projects metadata file

Loads data from the projects metadata file, which lists projects in the auto-process directory along with information on samples, associated organism and library etc.

Parameters:

project_metadata_file (str) – name of the metadata file relative to the analysis directory (default: ‘projects.info’)

Returns:

project metadata loaded from the

file in the analysis directory.

Return type:

ProjectMetadataFile

make_project_metadata_file(project_metadata_file='projects.info')

Create a new project metadata file

Parameters:

project_metadata_file (str) – name of the metadata file; relative paths are created under the analysis directory (default: ‘projects.info’)

property metadata_file

Return name of metadata file (‘metadata.info’)

property paired_end

Check if run is paired end

The endedness of the run is checked as follows:

  • If there are analysis project directories then the ended-ness is determined by checking the contents of these directories

  • If there are no project directories then the ended-ness is determined from the contents of the ‘unaligned’ directory

Returns:

True if run is paired end, False if single end,

None if endedness cannot be determined

Return type:

Boolean

property parameter_file

Return name of parameter file (‘auto_process.info’)

print_metadata()

Print the metadata items and associated values

print_params()

Print the current parameter settings

print_values(data)

Print key/value pairs from a dictionary

remove_tmp_dir(ignore_errors=False)

Remove the associated temporary directory

Parameters:

ignore_errors (bool) – if True then don’t raise an exception on error

property run_id

Return the run ID (e.g. ‘HISEQ_140701/242#22’)

property run_reference_id

Return the run reference (e.g. ‘NOVASEQ6000_230419/74#22_SP’

The run reference is the run ID plus the following additional items (if defined):

  • flow cell mode

save_data(ignore_errors=False)

Save parameters and metadata to file

Parameters:

ignore_errors (bool) – if True then don’t raise an exception on error

save_metadata(alt_metadata_file=None, force=False)

Save metadata to file

Parameters:
  • alt_metadata_file (str) – optional, path to an ‘alternative’ metadata file; otherwise metadata are saved to the default file for the processing directory.

  • force (boolean) – if True then force the metadata to be saved even if saving was previously turned off (default is False i.e. don’t force save).

save_parameters(alt_parameter_file=None, force=False)

Save parameters to file

Parameters:
  • alt_parameter_file (str) – optional, path to an ‘alternative’ parameter file; otherwise parameters are saved to the default file for the processing directory.

  • force (boolean) – if True then force the parameters to be saved even if saving was previously turned off (default is False i.e. don’t force save).

set_log_dir(path)

(Re)set the path for the log directory

If supplied path is relative then make a subdirectory in the existing log directory

Parameters:

path (str) – path for the log directory

Returns:

Full path for the new log directory.

Return type:

String

set_metadata(key, value)

Set an analysis directory metadata item

Parameters:
  • key (str) – parameter name

  • value (object) – value to assign to the parameter

set_param(key, value)

Set an analysis directory parameter

Parameters:
  • key (str) – parameter name

  • value (object) – value to assign to the parameter

update_metadata()

Updates and synchronises metadata in the analysis dir

The updates include: migrating relevant values across from other files (for older runs); setting the run name

update_project_metadata_file(unaligned_dir=None, project_metadata_file='projects.info')

Update project metadata file from bcl2fastq outputs

Updates the contents of the project metadata file (default: “projects.info”) from a bcl-to-fastq output directory, by adding new entries for projects in the bcl-to-fastq outputs which don’t currently appear.

Parameters:
  • unaligned_dir (str) – path to the bcl-to-fastq output directory relative to the analysis dir. Defaults to the unaligned dir stored in the analysis directory parameter file.

  • project_metatadata_file (str) – optional, path to the project metadata file to update