auto_process_ngs.applications

Utility classes and functions for generating and executing command lines to run various command line applications.

Static classes provide methods for building command lines for various NGS applications, in the form of ‘Command’ instances.

For example, to create a Command object representing the command line for a simple mirroring ‘rsync’ job:

>>> import applications
>>> rsync = applications.general.rsync('source','target',mirror=True)
>>> rsync
rsync -av --delete-after source target
>>> rsync.command_line
['rsync', '-av', '--delete-after', 'source', 'target']

The resulting command line string or list can be fed to another function or class, or it can be executed directly via the subprocess module using the ‘run_subprocess’ method of the Command object, e.g:

>>> rsync.run_subprocess()
class auto_process_ngs.applications.bcl2fastq

Bcl to fastq conversion line applications

Provides static methods to create Command instances for command line applications used in bcl to fastq conversion:

configureBclToFastq bcl2fastq2 bclconvert

static bcl2fastq2(run_dir, sample_sheet, output_dir='Unaligned', mismatches=None, bases_mask=None, ignore_missing_bcl=False, no_lane_splitting=False, minimum_trimmed_read_length=None, mask_short_adapter_reads=None, create_fastq_for_index_reads=False, find_adapters_with_sliding_window=False, loading_threads=None, demultiplexing_threads=None, processing_threads=None, writing_threads=None, bcl2fastq_exe=None)

Generate Command instance for ‘bcl2fastq’ program (v2.*)

Creates a Command instance to run the Illumina ‘bcl2fastq’ program (for versions 2.*).

Parameters:
  • run – path to the top-level directory for the run

  • sample_sheet – path to the sample sheet file to use

  • output_dir – optional, path to the output directory. Defaults to ‘Unaligned’

  • mismatches – optional, specify maximum number of mismatched bases allowed for matching index sequences during multiplexing. Recommended values are zero for indexes shorter than 6 base pairs, 1 for indexes of 6 or longer (If not specified and bases_mask is supplied then mismatches will be derived automatically from the bases mask string)

  • bases_mask – optional, specify string indicating how to treat each cycle within each read e.g. ‘y101,I6,y101’

  • ignore_missing_bcl – optional, if True then interpret missing bcl files as no call (default is False)

  • no_lane_splitting – optional, if True then don’t split FASTQ files by lane (–no-lane-splitting) (default is False)

  • minimum_trimmed_read_length – optional, specify minimum length for reads after adapter trimming (shorter reads will be padded with Ns to make them long enough)

  • mask_short_adapter_reads – optional, specify the minimum length of ACGT bases that must be present in a read after adapter trimming for it not to be masked completely with Ns.

  • create_fastq_for_index_reads – optional, if True then also create Fastq files for index reads (default, don’t create index read Fastqs) (–create-fastq-for-index-reads)

  • find_adapters_with_sliding_window – optional, if True then use the sliding window algorithm rather than string matching when identifying adapter sequences for trimming (default, don’t use sliding window algorithm) (–find-adapters-with-sliding-window)

  • loading_threads – optional, specify number of threads to use for loading bcl data (–loading-threads)

  • demultiplexing_threads – optional, specify number of threads to use for demultiplexing (–demultiplexing-threads)

  • processing_threads – optional, specify number of threads to use for processing (–processing-threads)

  • writing_threads – optional, specify number of threads to use for writing FASTQ data (–writing-threads)

  • bcl2fastq_exe – optional, if set then specifies the name/path of the bcl2fastq executable to use

Returns:

Command object.

static bclconvert(run_dir, output_dir, sample_sheet=None, lane=None, no_lane_splitting=False, sampleproject_subdirectories=False, num_parallel_tiles=None, num_conversion_threads=None, num_compression_threads=None, num_decompression_threads=None, bclconvert_exe=None)

Generate Command instance for ‘bcl-convert’ program (v3.*)

Creates a Command instance to run the Illumina ‘bcl-convert’ program (for versions 3.*).

Parameters:
  • run – path to the top-level directory for the run

  • output_dir – path to the output directory

  • sample_sheet – optional, path to the sample sheet file to use (must be present in top-level of input directory if not specified here)

  • lane (integer) – restrict processing to single lane (sample sheet must only contain this lane) (–bcl-only-lane)

  • no_lane_splitting – optional, if True then don’t split FASTQ files by lane (–no-lane-splitting) (default is False)

  • sampleproject_subdirectories – optional, if True then create subdirectories with project names in output (default is False) (–bcl-sampleproject-subdirectories)

  • num_parallel_tiles – optional, specify the number of tiles being converted to Fastqs in parallel (–bcl-num-parallel-tiles)

  • num_conversion_threads – optional, specify the number of threads to use for conversion per tile (–bcl-num-conversion-threads)

  • num_compression_threads – optional, specify the number of threads for compressing output Fastq files (–bcl-num-compression-threads)

  • num_decompression_threads – optional, specify the number of threads for decompression input bcl files (–bcl-num-decompression-threads)

  • bclconvert_exe – optional, if set then specifies the name/path of the bcl-convert executable to use

Returns:

Command object.

static configureBclToFastq(basecalls_dir, sample_sheet, output_dir='Unaligned', mismatches=None, bases_mask=None, force=False, ignore_missing_bcl=False, ignore_missing_stats=False, ignore_missing_control=False, configureBclToFastq_exe=None)

Generate Command instance for ‘configureBclToFastq.pl’ script

Creates a Command instance to run the CASAVA ‘configureBclToFastq.pl’ script (which generates a Makefile to perform the bcl to fastq conversion).

Parameters:
  • basecalls_dir – path to the top-level directory holding the bcl files (typically ‘Data/Intensities/Basecalls/’ subdirectory)

  • sample_sheet – path to the sample sheet file to use

  • output_dir – optional, path to the output directory. Defaults to ‘Unaligned’. If this directory already exists then the conversion will fail unless the force option is set to True

  • mismatches – optional, specify maximum number of mismatched bases allowed for matching index sequences during multiplexing. Recommended values are zero for indexes shorter than 6 base pairs, 1 for indexes of 6 or longer (If not specified and bases_mask is supplied then mismatches will be derived automatically from the bases mask string)

  • bases_mask – optional, specify string indicating how to treat each cycle within each read e.g. ‘y101,I6,y101’

  • force – optional, if True then force overwrite of an existing output directory (default is False)

  • ignore_missing_bcl – optional, if True then interpret missing bcl files as no call (default is False)

  • ignore_missing_stats – optional, if True then fill in with zeroes when *.stats files are missing (default is False)

  • ignore_missing_control – optional, if True then interpret missing control files as not-set control bits (default is False)

  • configureBclToFastq_exe – optional, if set then will be taken as the name/path for the ‘configureBclToFastq.pl’ script

Returns:

Command object.

class auto_process_ngs.applications.general

General command line applications (e.g. rsync, make)

Provides static methods to create Command instances for a class of ‘general’ command line applications:

rsync make

static make(makefile=None, working_dir=None, nprocessors=None)

Generate Command instance for ‘make’ command

Creates a Command instance to run ‘make’.

Parameters:
  • makefile – optional, name of input Makefile (-f)

  • working_dir – optional, specify the working directory to change to (-C)

  • nprocessors – optional, specify number of processors to use (-j)

Returns:

Command object.

static rsync(source, target, dry_run=False, mirror=False, chmod=None, prune_empty_dirs=False, extra_options=None)

Generate Command instance for ‘rsync’ command

Create a Command instance to run the ‘rsync’ command line, to recursively copy/sync one directory (the ‘source’) into another (the ‘target’).

The target can be a local directory or on a remote system (in which case it should be qualified with a user and hostname i.e. ‘user@hostname:target’).

Parameters:
  • source – the directory being copied/sync’ed

  • target – the directory the source will be copied into

  • dry_run – run rsync using –dry-run option i.e. no files will be copied/sync’ed, just reported

  • mirror – if True then run rsync in ‘mirror’ mode i.e. with –delete-after option (to remove files from the target that have also been removed from the source)

  • chmod – optional, mode specification to be applied to the copied files e.g. chmod=’u+rwX,g+rwX,o-w

  • prune_empty_dirs – optional, don’t include empty target directories i.e. -m option

  • extra_options – optional, a list of additional rsync options to be added to the command (e.g. –include and –exclude filter patterns)

Returns:

Command object.

static scp(user, server, source, target, recursive=False)

Generate Command instance for ‘scp’

Creates a Command instance to run ‘scp’ to copy to another system.

Parameters:
  • user – name of the remote user

  • server – name of the server

  • source – source file on local system

  • target – target destination on remote system

  • recursive – optional, if True then copy source recursively (i.e. specify the ‘-r’ option)

Returns:

Command object.

static ssh_command(user, server, cmd)

Generate Command instance for ‘ssh’ to execute a remote command

Creates a Command instance to run ‘ssh … COMMAND’.

Parameters:
  • user – name of the remote user

  • server – name of the server

  • cmd – command to execute on the server via ssh

Returns:

Command object.