auto_process_ngs.qc.modules.rseqc_infer_experiment

Implements the ‘rseqc_infer_experiment’ QC module:

  • RseqcInferExperiment: core QCModule class

  • RunRSeQCGenebodyCoverage: pipeline task to run ‘infer_experiment.py’

class auto_process_ngs.qc.modules.rseqc_infer_experiment.RseqcInferExperiment

Class for handling the ‘rseqc_infer_experiment’ QC module

classmethod add_to_pipeline(p, project_name, qc_dir, bam_files, reference_gene_model, organism_name, required_tasks=[], rseqc_runner=None)

Adds tasks for ‘rseqc_infer_experiment’ module to pipeline

Parameters:
  • p (Pipeline) – pipeline to extend

  • project_name (str) – name of project

  • qc_dir (str) – path to QC directory

  • bam_files (list) – BAM files to run the module on

  • reference_gene_model (str) – path to reference gene model BED file

  • organism_name (str) – normalised name for organism that BAMs are aligned to

  • required_tasks (list) – list of tasks that the module needs to wait for

  • rseqc_runner (JobRunner) – runner to use for RSeQC

classmethod collect_qc_outputs(qc_dir)

Collect information on RSeQC infer_experiment.py outputs

Returns an AttributeDictionary with the following attributes:

  • name: set to ‘rseqc_infer_experiment’

  • software: dictionary of software and versions

  • organisms: list of organisms with associated outputs

  • bam_files: list of associated BAM file names

  • output_files: list of associated output files

  • tags: list of associated output classes

Parameters:

qc_dir (QCDir) – QC directory to examine

classmethod verify(params, qc_outputs)

Verify ‘rseqc_infer_experiment’ QC module against outputs

Returns one of 3 values:

  • True: outputs verified ok

  • False: outputs failed to verify

  • None: verification not possible

Parameters:
  • params (AttributeDictionary) – values of parameters used as inputs

  • qc_outputs (AttributeDictionary) – QC outputs returned from the ‘collect_qc_outputs’ method

class auto_process_ngs.qc.modules.rseqc_infer_experiment.RunRSeQCInferExperiment(_name, *args, **kws)

Run RSeQC’s ‘infer_experiment.py’ on BAM files

Given a list of BAM files, for each file runs the RSeQC ‘infer_experiment.py’ utility (http://rseqc.sourceforge.net/#infer-experiment-py).

The log for each run is written to a file called ‘<BASENAME>.infer_experiment.log’; the data are also extracted and put into an output parameter for direct consumption by downstream tasks.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(bam_files, reference_gene_model, out_dir)

Initialise the RunRSeQCInferExperiment task

Parameters:
  • bam_files (list) – list of paths to BAM files to run infer_experiment.py on

  • reference_gene_model (str) – path to BED file with the reference gene model data

  • out_dir (str) – path to a directory where the output files will be written

Outputs:
experiments: a dictionary with BAM files as

keys; each value is another dictionary with keys ‘paired_end’ (True for paired-end data, False for single-end), ‘reverse’, ‘forward’ and ‘unstranded’ (fractions of reads mapped in each configuration).

setup()

Set up commands to be performed by the task

Must be implemented by the subclass