auto_process_ngs.auto_processor
- class auto_process_ngs.auto_processor.AutoProcess(analysis_dir=None, settings=None, allow_save_params=True)
Class implementing an automatic fastq generation and QC processing procedure for Illumina sequencing data
- check_metadata(items)
Check that metadata items are set
For metadata items supplied as an iterable in ‘items’, check that each is set to a non-null value. Report those that are null.
Return False if one or more are null; otherwise return True.
- edit_readme()
Bring up README in an editor
- get_analysis_projects(pattern=None)
Return the analysis projects in a list
By default returns all projects within the analysis directory which are listed in the ‘projects.info’ metadata file (and ‘undetermined’, which is not).
If the ‘pattern’ is not None then it should be a simple pattern used to match against available names to select a subset of projects (see bcf_utils.name_matches).
If any project in ‘projects.info’ doesn’t have an associated analysis directory then it will be omitted from the results.
- Parameters:
pattern (str) – optional pattern to select a subset of projects (default: select all projects)
- Returns:
list of AnalysisProject instances.
- Return type:
- get_analysis_projects_from_dirs(pattern=None, strict=False)
Return a list of AnalysisProjects in the analysis directory
Tests each of the subdirectories in the top-level of the analysis directory and rejects any that appear to be CASVAVA/bcl2fastq outputs or which don’t successfully load as AnalysisProject instances.
Unlike the get_analysis_projects method, no checking against the project metadata (typically in ‘projects.info’) is performed.
If the ‘pattern’ is not None then it should be a simple pattern used to match against available names to select a subset of projects (see bcf_utils.name_matches).
- Parameters:
pattern (str) – optional pattern to select a subset of projects (default: select all projects)
strict (bool) – if True then apply strict checks on each discovered project directory before adding it to the list (default: don’t apply strict checks)
- Returns:
list of AnalysisProject instances.
- Return type:
- get_log_subdir(name)
Return the name for a new log subdirectory
Subdirectories are named as NNN_<name> e.g. 001_setup, 002_make_fastqs etc
- Parameters:
name (str) – name for the subdirectory (typically the name of the processing stage that will produce logs to be written to the subdirs
- Returns:
- name for the new log subdirectory
(nb not the full path).
- Return type:
String
- property has_parameter_file
Indicate if there is a parameter file (typically auto_process.info)
- init_readme()
Create a new README file
- load_metadata(allow_save=True)
Load metadata values from file
- Parameters:
allow_save (boolean) – if True then allow metadata items to be saved back to the metadata file (the default); otherwise don’t allow save.
- load_parameters(allow_save=True)
Load parameter values from file
- Parameters:
allow_save (boolean) – if True then allow params to be saved back to the parameter file (the default); otherwise don’t allow save.
- load_project_metadata(project_metadata_file=None)
Load data from projects metadata file
Loads data from the projects metadata file, which lists projects in the auto-process directory along with information on samples, associated organism and library etc.
- Parameters:
project_metadata_file (str) – name of the metadata file relative to the analysis directory (default: ‘projects.info’)
- Returns:
- project metadata loaded from the
file in the analysis directory.
- Return type:
- make_project_metadata_file(project_metadata_file='projects.info')
Create a new project metadata file
- Parameters:
project_metadata_file (str) – name of the metadata file; relative paths are created under the analysis directory (default: ‘projects.info’)
- property metadata_file
Return name of metadata file (‘metadata.info’)
- property paired_end
Check if run is paired end
The endedness of the run is checked as follows:
If there are analysis project directories then the ended-ness is determined by checking the contents of these directories
If there are no project directories then the ended-ness is determined from the contents of the ‘unaligned’ directory
- Returns:
- True if run is paired end, False if single end,
None if endedness cannot be determined
- Return type:
Boolean
- property parameter_file
Return name of parameter file (‘auto_process.info’)
- print_metadata()
Print the metadata items and associated values
- print_params()
Print the current parameter settings
- print_values(data)
Print key/value pairs from a dictionary
- remove_tmp_dir(ignore_errors=False)
Remove the associated temporary directory
- Parameters:
ignore_errors (bool) – if True then don’t raise an exception on error
- property run_id
Return the run ID (e.g. ‘HISEQ_140701/242#22’)
- property run_reference_id
Return the run reference (e.g. ‘NOVASEQ6000_230419/74#22_SP’
The run reference is the run ID plus the following additional items (if defined):
flow cell mode
- save_data(ignore_errors=False)
Save parameters and metadata to file
- Parameters:
ignore_errors (bool) – if True then don’t raise an exception on error
- save_metadata(alt_metadata_file=None, force=False)
Save metadata to file
- Parameters:
alt_metadata_file (str) – optional, path to an ‘alternative’ metadata file; otherwise metadata are saved to the default file for the processing directory.
force (boolean) – if True then force the metadata to be saved even if saving was previously turned off (default is False i.e. don’t force save).
- save_parameters(alt_parameter_file=None, force=False)
Save parameters to file
- Parameters:
alt_parameter_file (str) – optional, path to an ‘alternative’ parameter file; otherwise parameters are saved to the default file for the processing directory.
force (boolean) – if True then force the parameters to be saved even if saving was previously turned off (default is False i.e. don’t force save).
- set_log_dir(path)
(Re)set the path for the log directory
If supplied
path
is relative then make a subdirectory in the existing log directory- Parameters:
path (str) – path for the log directory
- Returns:
Full path for the new log directory.
- Return type:
String
- set_metadata(key, value)
Set an analysis directory metadata item
- Parameters:
key (str) – parameter name
value (object) – value to assign to the parameter
- set_param(key, value)
Set an analysis directory parameter
- Parameters:
key (str) – parameter name
value (object) – value to assign to the parameter
- update_metadata()
Updates and synchronises metadata in the analysis dir
The updates include: migrating relevant values across from other files (for older runs); setting the run name
- update_project_metadata_file(unaligned_dir=None, project_metadata_file='projects.info')
Update project metadata file from bcl2fastq outputs
Updates the contents of the project metadata file (default: “projects.info”) from a bcl-to-fastq output directory, by adding new entries for projects in the bcl-to-fastq outputs which don’t currently appear.
- Parameters:
unaligned_dir (str) – path to the bcl-to-fastq output directory relative to the analysis dir. Defaults to the unaligned dir stored in the analysis directory parameter file.
project_metatadata_file (str) – optional, path to the project metadata file to update