auto_process_ngs.qc.modules.picard_insert_size_metrics

Implements the ‘picard_insert_size_metrics’ QC module:

  • PicardInsertSizeMetrics: core QCModule class

  • RunPicardCollectInsertSizeMetrics: pipeline task to run Picard ‘CollectInsertSizeMetrics’

  • CollateInsertSizes: pipeline task to collate insert sizes

class auto_process_ngs.qc.modules.picard_insert_size_metrics.CollateInsertSizes(_name, *args, **kws)

Collate insert size metrics data from multiple BAMs

Gathers together the Picard insert size data from a set of BAM files and puts them into a single TSV file.

init(bam_files, picard_out_dir, out_file, delimiter='\t')

Initialise the CollateInsertSizes task

Parameters:
  • bam_files (list) – list of paths to BAM files to get associated insert size data for

  • picard_out_dir (str) – path to the directory containing the Picard CollectInsertSizeMetrics output files

  • out_file (str) – path to the output TSV file

  • delimiter (str) – specify the delimiter to use in the output file

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.qc.modules.picard_insert_size_metrics.PicardInsertSizeMetrics

Class for handling the ‘picard_insert_size_metrics’ QC module

classmethod add_to_pipeline(p, project_name, project, qc_dir, bam_files, organism_name, required_tasks=[], compute_runner=None)

Adds tasks for ‘picard_insert_size_metrics’ module to pipeline

Parameters:
  • p (Pipeline) – pipeline to extend

  • project_name (str) – name of project

  • project (AnalysisProject) – project to run module on

  • qc_dir (str) – path to QC directory

  • bam_files (list) – BAM files to run the module on

  • organism_name (str) – normalised name for organism that BAMs are aligned to

  • required_tasks (list) – list of tasks that the module needs to wait for

  • compute_runner (JobRunner) – runner to use for computation

classmethod collect_qc_outputs(qc_dir)

Collect information on picard_insert_size_metrics outputs

Returns an AttributeDictionary with the following attributes:

  • name: set to ‘picard_collect_insert_size_metrics’

  • software: dictionary of software and versions

  • organisms: list of organisms with associated outputs

  • bam_files: list of associated BAM file names

  • output_files: list of associated output files

  • tags: list of associated output classes

Parameters:

qc_dir (QCDir) – QC directory to examine

classmethod verify(params, qc_outputs)

Verify ‘picard_insert_size_metrics’ QC module against outputs

Returns one of 3 values:

  • True: outputs verified ok

  • False: outputs failed to verify

  • None: verification not possible

Parameters:
  • params (AttributeDictionary) – values of parameters used as inputs

  • qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method

class auto_process_ngs.qc.modules.picard_insert_size_metrics.RunPicardCollectInsertSizeMetrics(_name, *args, **kws)

Run Picard ‘CollectInsertSizeMetrics’ on BAM files

Given a list of BAM files, for each file first runs the Picard ‘CleanSam’ utility (to remove alignments that would otherwise cause problems for the insert size calculations) and then ‘CollectInsertSizeMetrics’ to generate the insert size metrics.

Note that this task should only be run on BAM files with paired-end data.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(bam_files, out_dir)

Initialise the RunPicardCollectInsertSizeMetrics task

Parameters:
  • bam_files (list) – list of paths to BAM files to run CollectInsertSizeMetrics on

  • out_dir (str) – path to a directory where the output files will be written

setup()

Set up commands to be performed by the task

Must be implemented by the subclass