auto_process_ngs.auto_processor
- class auto_process_ngs.auto_processor.AutoProcess(analysis_dir=None, settings=None, allow_save_params=True)
Class implementing an automatic fastq generation and QC processing procedure for Illumina sequencing data
The ‘AutoProcess’ class provides an interface to a directory being used for processing Illumina sequencing data, including Fastq generation and QC operations.
The auto_process data processing and QC pipelines are constructed around this class, which allows the current state of the processing directory (and associated metadata) to be accessed and modified.
- Parameters:
analysis_dir (str) – name/path for existing analysis directory
settings (Settings) – optional, if supplied then should be a Settings instance; otherwise use a default instance populated from the installation-specific ‘auto_process.ini’ file
allow_save_params (bool) – if True then allow updates to parameters to be saved back to the parameter file (this is the default)
- check_metadata(items)
Check that metadata items are set
For metadata items supplied as an iterable in ‘items’, check that each is set to a non-null value. Report those that are null.
Return False if one or more are null; otherwise return True.
- edit_readme()
Bring up README in an editor
- get_analysis_projects(pattern=None)
Return the analysis projects in a list
By default returns all projects within the analysis directory which are listed in the ‘projects.info’ metadata file (and ‘undetermined’, which is not).
If the ‘pattern’ is not None then it should be a simple pattern used to match against available names to select a subset of projects (see bcf_utils.name_matches).
If any project in ‘projects.info’ doesn’t have an associated analysis directory then it will be omitted from the results.
- Parameters:
pattern (str) – optional pattern to select a subset of projects (default: select all projects)
- Returns:
list of AnalysisProject instances.
- Return type:
- get_analysis_projects_from_dirs(pattern=None, strict=False)
Return a list of AnalysisProjects in the analysis directory
Tests each of the subdirectories in the top-level of the analysis directory and rejects any that appear to be CASVAVA/bcl2fastq outputs or which don’t successfully load as AnalysisProject instances.
Unlike the get_analysis_projects method, no checking against the project metadata (typically in ‘projects.info’) is performed.
If the ‘pattern’ is not None then it should be a simple pattern used to match against available names to select a subset of projects (see bcf_utils.name_matches).
- Parameters:
pattern (str) – optional pattern to select a subset of projects (default: select all projects)
strict (bool) – if True then apply strict checks on each discovered project directory before adding it to the list (default: don’t apply strict checks)
- Returns:
list of AnalysisProject instances.
- Return type:
- get_log_subdir(name)
Return the name for a new log subdirectory
Subdirectories are named as NNN_<name> e.g. 001_setup, 002_make_fastqs etc
- Parameters:
name (str) – name for the subdirectory (typically the name of the processing stage that will produce logs to be written to the subdirs
- Returns:
- name for the new log subdirectory
(nb not the full path).
- Return type:
String
- property has_parameter_file
Indicate if there is a parameter file (typically auto_process.info)
- init_readme()
Create a new README file
- load_metadata(allow_save=True)
Load metadata values from file
- Parameters:
allow_save (boolean) – if True then allow metadata items to be saved back to the metadata file (the default); otherwise don’t allow save.
- load_parameters(allow_save=True)
Load parameter values from file
- Parameters:
allow_save (boolean) – if True then allow params to be saved back to the parameter file (the default); otherwise don’t allow save.
- load_project_metadata(project_metadata_file=None)
Load data from projects metadata file
Loads data from the projects metadata file, which lists projects in the auto-process directory along with information on samples, associated organism and library etc.
- Parameters:
project_metadata_file (str) – name of the metadata file relative to the analysis directory (default: ‘projects.info’)
- Returns:
- project metadata loaded from the
file in the analysis directory.
- Return type:
- make_project_metadata_file(project_metadata_file='projects.info')
Create a new project metadata file
- Parameters:
project_metadata_file (str) – name of the metadata file; relative paths are created under the analysis directory (default: ‘projects.info’)
- property metadata_file
Return name of metadata file (‘metadata.info’)
- property paired_end
Check if run is paired end
The endedness of the run is checked as follows:
If there are analysis project directories then the ended-ness is determined by checking the contents of these directories
If there are no project directories then the ended-ness is determined from the contents of the ‘unaligned’ directory
- Returns:
- True if run is paired end, False if single end,
None if endedness cannot be determined
- Return type:
Boolean
- property parameter_file
Return name of parameter file (‘auto_process.info’)
- print_metadata()
Print the metadata items and associated values
- print_params()
Print the current parameter settings
- print_values(data)
Print key/value pairs from a dictionary
- remove_tmp_dir(ignore_errors=False)
Remove the associated temporary directory
- Parameters:
ignore_errors (bool) – if True then don’t raise an exception on error
- property run_id
Return the run ID (e.g. ‘HISEQ_140701/242#22’)
If a run ID is explicitly stored then return that, otherwise construct the ID from the run name, platform, run number and analysis number.
- property run_reference_id
Return the run reference (e.g. ‘NOVASEQ6000_230419/74#22_SP’
If a run reference is explicitly stored then return that, otherwise construct the reference from the run ID plus the following additional items (if defined):
flow cell mode
- save_data(ignore_errors=False)
Save parameters and metadata to file
- Parameters:
ignore_errors (bool) – if True then don’t raise an exception on error
- save_metadata(alt_metadata_file=None, force=False)
Save metadata to file
- Parameters:
alt_metadata_file (str) – optional, path to an ‘alternative’ metadata file; otherwise metadata are saved to the default file for the processing directory.
force (boolean) – if True then force the metadata to be saved even if saving was previously turned off (default is False i.e. don’t force save).
- save_parameters(alt_parameter_file=None, force=False)
Save parameters to file
- Parameters:
alt_parameter_file (str) – optional, path to an ‘alternative’ parameter file; otherwise parameters are saved to the default file for the processing directory.
force (boolean) – if True then force the parameters to be saved even if saving was previously turned off (default is False i.e. don’t force save).
- set_log_dir(path)
(Re)set the path for the log directory
If supplied
pathis relative then make a subdirectory in the existing log directory- Parameters:
path (str) – path for the log directory
- Returns:
Full path for the new log directory.
- Return type:
String
- set_metadata(key, value)
Set an analysis directory metadata item
- Parameters:
key (str) – parameter name
value (object) – value to assign to the parameter
- set_param(key, value)
Set an analysis directory parameter
- Parameters:
key (str) – parameter name
value (object) – value to assign to the parameter
- sync_project_metadata()
Update metadata stored in project dirs with ‘projects.info’
- sync_project_metadata_file()
Synchronise ‘projects.info’ file with directory contents
- update_metadata()
Updates and synchronises metadata in the analysis dir
The updates include: migrating relevant values across from other files (for older runs); setting the run name
- update_paths(base_path=None, new_path=None)
Update the paths stored in the analysis directory
Checks the paths stored in the analysis directory metadata and parameter files, and updates them if they’re inconsistent with the current location.
By default the original ‘base’ path is taken from the path stored in the analysis directory parameters, and the new ‘base’ path is assumed to be the current path for the analysis directory (these settings are sensible if the directory has been relocated or copied).
- Parameters:
base_path (str) – current ‘base’ directory path
directory ((defaults to path stored in analysis) –
parameters) –
new_path (str) – new ‘base’ directory path
analysis ((defaults to the current path of the) –
directory) –
- update_project_metadata_file(unaligned_dir=None, project_metadata_file='projects.info')
Update project metadata file from bcl2fastq outputs
Updates the contents of the project metadata file (default: “projects.info”) from a bcl-to-fastq output directory, by adding new entries for projects in the bcl-to-fastq outputs which don’t currently appear.
- Parameters:
unaligned_dir (str) – path to the bcl-to-fastq output directory relative to the analysis dir. Defaults to the unaligned dir stored in the analysis directory parameter file.
project_metatadata_file (str) – optional, path to the project metadata file to update