auto_process_ngs.qc.modules.picard_insert_size_metrics
Implements the ‘picard_insert_size_metrics’ QC module:
PicardInsertSizeMetrics: core QCModule class
RunPicardCollectInsertSizeMetrics: pipeline task to run Picard ‘CollectInsertSizeMetrics’
CollateInsertSizes: pipeline task to collate insert sizes
- class auto_process_ngs.qc.modules.picard_insert_size_metrics.CollateInsertSizes(_name, *args, **kws)
Collate insert size metrics data from multiple BAMs
Gathers together the Picard insert size data from a set of BAM files and puts them into a single TSV file.
- init(bam_files, picard_out_dir, out_file, delimiter='\t')
Initialise the CollateInsertSizes task
- Parameters:
bam_files (list) – list of paths to BAM files to get associated insert size data for
picard_out_dir (str) – path to the directory containing the Picard CollectInsertSizeMetrics output files
out_file (str) – path to the output TSV file
delimiter (str) – specify the delimiter to use in the output file
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass
- class auto_process_ngs.qc.modules.picard_insert_size_metrics.PicardInsertSizeMetrics
Class for handling the ‘picard_insert_size_metrics’ QC module
- classmethod add_to_pipeline(p, project_name, project, qc_dir, bam_files, organism_name, required_tasks=[], compute_runner=None)
Adds tasks for ‘picard_insert_size_metrics’ module to pipeline
- Parameters:
p (Pipeline) – pipeline to extend
project_name (str) – name of project
project (AnalysisProject) – project to run module on
qc_dir (str) – path to QC directory
bam_files (list) – BAM files to run the module on
organism_name (str) – normalised name for organism that BAMs are aligned to
required_tasks (list) – list of tasks that the module needs to wait for
compute_runner (JobRunner) – runner to use for computation
- classmethod collect_qc_outputs(qc_dir)
Collect information on picard_insert_size_metrics outputs
Returns an AttributeDictionary with the following attributes:
name: set to ‘picard_collect_insert_size_metrics’
software: dictionary of software and versions
organisms: list of organisms with associated outputs
bam_files: list of associated BAM file names
output_files: list of associated output files
tags: list of associated output classes
- Parameters:
qc_dir (QCDir) – QC directory to examine
- classmethod verify(params, qc_outputs)
Verify ‘picard_insert_size_metrics’ QC module against outputs
Returns one of 3 values:
True: outputs verified ok
False: outputs failed to verify
None: verification not possible
- Parameters:
params (AttributeDictionary) – values of parameters used as inputs
qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method
- class auto_process_ngs.qc.modules.picard_insert_size_metrics.RunPicardCollectInsertSizeMetrics(_name, *args, **kws)
Run Picard ‘CollectInsertSizeMetrics’ on BAM files
Given a list of BAM files, for each file first runs the Picard ‘CleanSam’ utility (to remove alignments that would otherwise cause problems for the insert size calculations) and then ‘CollectInsertSizeMetrics’ to generate the insert size metrics.
Note that this task should only be run on BAM files with paired-end data.
- finish()
Perform actions on task completion
Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.
Must be implemented by the subclass
- init(bam_files, out_dir)
Initialise the RunPicardCollectInsertSizeMetrics task
- Parameters:
bam_files (list) – list of paths to BAM files to run CollectInsertSizeMetrics on
out_dir (str) – path to a directory where the output files will be written
- setup()
Set up commands to be performed by the task
Must be implemented by the subclass