auto_process_ngs.icell8.pipeline

icell8.pipeline.py

Pipeline components for processing the outputs from the ICELL8 platform.

Pipeline classes:

  • ICell8QCFilter

  • ICell8FinalReporting

Pipeline command classes:

  • ICell8Statistics

  • SplitAndFilterFastqPair

  • BatchFastqs

  • ConcatFastqs

  • TrimFastqPair

  • FilterPolyGReads

  • ContaminantFilterFastqPair

Pipeline task classes:

  • SetupDirectories

  • CollectFiles

  • PairFastqs

  • GetICell8Stats

  • GetICell8PolyGStats

  • SplitFastqsIntoBatches

  • FilterICell8Fastqs

  • TrimReads

  • GetReadsWithPolyGRegions

  • FilterContaminatedReads

  • SplitByBarcodes

  • GroupFastqsByBarcode

  • GroupFastqsBySample

  • MergeBarcodeFastqs

  • MergeSampleFastqs

  • CheckICell8Barcodes

  • ConvertStatsToXLSX

  • ReportProcessing

  • UpdateProjectData

  • CleanupDirectory

Functions:

  • tmp_dir

  • convert_to_xlsx

class auto_process_ngs.icell8.pipeline.BatchFastqs(*args, **kws)

Split reads from Fastqs into batches using (z)cat/split

Given a list of Fastq files, combines them and then splits into batches of a specified number of reads by running a combination of ‘(z)cat’ and ‘split’ commands.

Fastqs can be gzipped, but must have the same read number (i.e. R1 or R2).

cmd()

Build the command

Must be implemented by the subclass and return a Command instance

init(fastqs, batch_dir, basename, batch_size=5000000)

Create a new BatchFastqs instance

Parameters:
  • fastqs (list) – list of input Fastq files

  • batch_dir (str) – destination directory to write output files to

  • basename (str) – basename for output Fastqs

  • batch_size (int) – number of reads per output FASTQ (in batch mode) (optional)

class auto_process_ngs.icell8.pipeline.CheckICell8Barcodes(_name, *args, **kws)

Check the barcodes are consistent

This is a sanity check: ensure that the inline barcodes for all reads in the R1 Fastq for the barcode Fastq pairs matches the assigned barcode.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastqs)

Initialise the CheckICell8Barcodes task

Parameters:

fastqs (list) – Fastq files to check

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.CleanupDirectory(_name, *args, **kws)

Remove a directory and all its contents

init(dirn)

Initialise the CleanupDirectory task

Parameters:

dirn (str) – path to the directory to remove

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.CollectFiles(_name, *args, **kws)

Collect list of files matching glob pattern

This is a utility task that can be used to collect a list of files in a directory which matches a ‘glob’-style pattern.

It is intended to offer an alternative to the FileCollector class, when it is desirable to farm out the file collection to an external process (e.g. when there are very large numbers of files to examine).

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(dirn, pattern)

Initialise the CollectFiles task

Parameters:
  • dirn (str) – path to the directory holding the files to be collected

  • pattern (str) – glob-style pattern to match to file names

Outputs:

files: list of collected files

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.ConcatFastqs(*args, **kws)

Concatenate reads from multiple Fastqs into a single file

Given a list of Fastq files, combines them into a single Fastq using the ‘cat’ utility.

If the output FASTQ names end with .gz then they will be automatically compressed with gzip after concatenation.

FASTQs cannot be gzipped, and must all be same read number (i.e. R1 or R2).

cmd()

Build the command

Must be implemented by the subclass and return a Command instance

init(fastqs, concat_dir, fastq_out)

Create a new ConcatFastqs instance

Parameters:
  • fastqs (list) – list of input FASTQ files

  • concat_dir (str) – destination directory to write output file to

  • fastq_out (str) – name of output Fastq file

class auto_process_ngs.icell8.pipeline.ContaminantFilterFastqPair(*args, **kws)

Build command to run ‘icell8_contaminantion_filter.py’ utility

cmd()

Build the command

Must be implemented by the subclass and return a Command instance

init(fastq_pair, filter_dir, mammalian_conf, contaminants_conf, aligner=None, threads=None)

Create a new TrimFastqPair instance

Parameters:
  • fastq_pair (list) – R1/R1 FASTQ file pair

  • filter_dir (str) – destination directory to write output files to

  • mammalian_conf (str) – path to FastqScreen .conf file with mammalian genome indexes

  • contaminants_conf (str) – path FastqScreen .conf file with contaminant genome indexes

  • aligner (str) – explicitly specify name of aligner to use with FastqScreen (e.g. ‘bowtie2’) (optional)

  • threads (int) – explicitly specify number of threads to run FastqScreen using (optional)

class auto_process_ngs.icell8.pipeline.ConvertStatsToXLSX(_name, *args, **kws)

Convert the stats file to XLSX format

init(stats_file, xlsx_file)

Initialise the ConvertStatsToXLSX task

Parameters:
  • stats_file (str) – path to input stats file

  • xlsx_file (str) – path to output XLSX file

Outputs:

xlsx_file: path to the output XLSX file

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.FilterContaminatedReads(_name, *args, **kws)

Filter ‘contaminated’ reads from Fastq files

Given a set of Fastqs, arrange into R1/R2 file pairs and run ‘fastq_screen’ on the R2 reads against panels of ‘mammalian’ and ‘contaminant’ organisms.

Read pairs where there is an exclusive match to the contaminants (i.e. without any match to the mammalian genomes) are excluded.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_pairs, filter_dir, mammalian_conf, contaminants_conf, aligner=None, threads=None)

Initialise the FilterContaminatedReads task

Parameters:
  • fastq_pairs (list) – input Fastq R1/R2 pairs

  • filter_dir (str) – destination directory to write output files to

  • mammalian_conf (str) – path to FastqScreen .conf file with mammalian genome indexes

  • contaminants_conf (str) – path FastqScreen .conf file with contaminant genome indexes

  • aligner (str) – explicitly specify name of aligner to use with FastqScreen (e.g. ‘bowtie2’) (optional)

  • threads (int) – explicitly specify number of threads to run FastqScreen using (default: taken from job runner)

Outputs:
pattern (str): glob-style pattern matching output

Fastq file names

fastqs (FileCollector): output Fastq files

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.FilterICell8Fastqs(_name, *args, **kws)

Perform read assignment and optional quality filtering

For each input R1/R2 Fastq file pair:

  • if a well list is supplied then check that the ICell8 barcode matches one in the well list

  • if filtering is turned on then remove reads where the ICell8 barcode and/or UMI fail to meet the minimum quality standard across all bases

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_pairs, filter_dir, well_list=None, mode='none', discard_unknown_barcodes=False, quality_filter=False)

Initialise the FilterICell8Fastqs task

Parameters:
  • fastq_pairs (list) – input FASTQ R1/R2 file pairs

  • filter_dir (str) – destination directory to write output files to

  • well_list (str) – ‘well list’ file to use (optional)

  • mode (str) – mode to run the utility in

  • discard_unknown_barcodes (bool) – if True then discard read pairs where the barcode doesn’t match one of those in the well list file (nb well list file must also be supplied in this case) (all reads are kept by default)

  • quality_filter (bool) – if True then also do filtering based on barcode- and UMI-quality (no filtering is performed by default)

Outputs:

The returned object has the following properties:

  • fastqs: object with properties which point to iterators listing output Fastqs (see below)

  • patterns: object with properties which are glob-style patterns matching output Fastqs (see below)

The output Fastqs are:

  • assigned: Fastqs with reads assigned to known barcodes

  • unassigned: Fastqs with reads not assigned to known barcodes

  • failed_barcodes: Fastqs with reads which failed the barcode quality check

  • failed_umis: Fastqs with reads which failed the UMI quality check

For example:

  • output.pattern.assigned = glob pattern to match Fastqs with assigned reads

  • output.fastqs.unassigned = iterator listing Fastqs with unassigned reads

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.FilterPolyGReads(*args, **kws)

Run ‘cutadapt’ to fetch reads with poly-G regions

cmd()

Build the command

Must be implemented by the subclass and return a Command instance

init(fastq_pair, out_dir)

Create a new GetPolyGReads instance

Parameters:
  • fastq_pair (list) – R1/R1 FASTQ file pair

  • out_dir (str) – destination directory to write output files to

class auto_process_ngs.icell8.pipeline.GetICell8PolyGStats(_name, *args, **kws)

Generate statistics for ICell8 poly-G detection

Subclass of GetICell8Stats task; generates and appends additional column expressing poly-G read counts as a percentage of total filtered read counts for each barcode.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.GetICell8Stats(_name, *args, **kws)

Generate statistics for ICell8 processing stage

Counts the reads and distinct UMIs per barcode for reads pooled from the set of supplied Fastqs and writes these to columns in a tab-delimited output file.

If the output file doesn’t exist then it will created. If ‘append’ isn’t specified then an existing file will be deleted and its contents lost.

The barcodes are either taken from the supplied well list file, or from the first column of the output file (if it exists).

If ‘unassigned’ is specified then stats will also be collected on reads which don’t match any barcode.

By default the counts are written to columns called Nreads and Distinct_UMIs; a suffix can be specified to distinguish the counts from those from different stages.

If the columns already exist in the file when appending then they will be overwritten.

init(fastqs, stats_file, well_list=None, suffix=None, unassigned=False, append=False, nprocs=None, temp_dir=None)

Initialise the GetICell8Stats task

Parameters:
  • fastqs (list) – list of Fastqs to get stats from

  • stats_file (str) – path to stats file

  • well_list (str) – path to a well list file to take the barcodes from (optional)

  • suffix (str) – suffix to append to the output column names (optional)

  • unassigned (bool) – if True then also collect stats for read pairs that don’t match any of the expected barcodes from the well list or existing stats file (by default unassigned stats are not collected)

  • append (bool) – if True then append columns to existing output file (by default creates new output file)

  • nprocs (int) – number of cores available for stats (default: taken from job runner)

Outputs:

stats_file (str): path to the output stats file

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.GetReadsWithPolyGRegions(_name, *args, **kws)

Run ‘cutadapt’ to identify reads with poly-G regions

Given a set of Fastq R1/R2 pairs, identifies read pairs for which R2 appears to contain poly-G regions (all other read pairs are discarded).

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_pairs, poly_g_regions_dir)

Initialise the GetReadsWithPolyGRegions task

Parameters:
  • fastqs (list) – input Fastq R1/R2 pairs

  • out_dir (str) – destination directory to write output files to

Outputs:
pattern (str): glob-style pattern matching output

Fastq file names

fastqs (FileCollector): output Fastq files

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.GroupFastqsByBarcode(_name, *args, **kws)

Group a list of Fastqs by associated barcode

Given a set of Fastqs, groups them into lists where each list belongs to the same barcode.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastqs)

Initialise the GroupFastqsByBarcode task

Parameters:

fastqs (list) – input Fastq files

Outputs:
fastq_groups (dict): dictionary where keys

are barcodes and values are lists of associated Fastqs

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.GroupFastqsBySample(_name, *args, **kws)

Group a list of Fastqs by associated sample

Given a set of Fastqs, groups them into lists where each list belongs to the same sample.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastqs, well_list_file)

Initialise the GroupFastqsBySample task

Parameters:
  • fastqs (list) – input Fastq files

  • well_list_file (str) – ‘well list’ file to get sample names and barcodes from

Outputs:
fastq_groups (dict): dictionary where keys

are samples and values are lists of associated Fastqs pairs as (R1,R2) tuples

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.ICell8FinalReporting(outdir, project=None)

Perform final reporting from ICELL8 pipeline

Pipeline to perform final reporting of the ICELL8 processing:

  • Write the final processing report

  • Set the primary fastq dir to ‘fastqs.samples’ (if in an analysis project directory)

class auto_process_ngs.icell8.pipeline.ICell8QCFilter(outdir, fastqs, well_list_file, mammalian_conf, contaminants_conf, batch_size, stats_dir='stats', barcode_fastqs_dir='fastqs.barcodes', sample_fastqs_dir='fastqs.samples', basename=None, aligner=None, do_contaminant_filter=True, do_quality_filter=False, do_clean_up=True, nprocessors=None)

Run QC filtering on ICELL8 data

Pipeline to perform QC filtering on ICELL8 Fastq data:

  • Splits reads into batches

  • Filter out reads which don’t match any of the barcode sequences

  • Optionally: filter out reads which don’t meet the barcode or UMI quality criteria

  • Trim and quality filter reads with cutadapt

  • Estimate numbers of reads with poly-G regions

  • Optionally: perform contaminant filtering

  • Assemble reads into Fastqs by barcode and by sample name

Also generates statistics for numbers of reads and UMIs for each barcode at each stage.

run(*args, **kws)

Run the tasks in the pipeline

Takes the same arguments as the Pipeline base class and performs post-termination clean up of temporary directory.

class auto_process_ngs.icell8.pipeline.ICell8Statistics(*args, **kws)

Build command to run the ‘icell8_stats.py’ utility

cmd()

Build the command

Must be implemented by the subclass and return a Command instance

init(fastqs, stats_file, well_list=None, suffix=None, unassigned=False, append=False, nprocs=1, temp_dir=None)

Create new ICell8Statistics instance

Parameters:
  • fastqs (list) – list of FASTQ file names

  • stats_file (str) – path to output file

  • well_list (str) – path to ‘well list’ file (optional)

  • suffix (str) – suffix to append to columns with read and UMI counts (optional)

  • unassigned (bool) – if True then also collect stats for read pairs that don’t match any of the expected barcodes from the well list or existing stats file (by default unassigned stats are not collected)

  • append (bool) – if True then append columns to existing output file (by default creates new output file)

  • nprocs (int) – number of cores available for stats (default: 1)

class auto_process_ngs.icell8.pipeline.MergeBarcodeFastqs(_name, *args, **kws)

Given a set of Fastq files with filtered reads, arrange into R1/R2 pairs then pool read pairs belonging to the same ICell8 barcode.

Also concatenate R1/R2 Fastq pairs for unassigned read pairs, and read pairs which failed the barcode and UMI quality filters.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_groups, unassigned_fastq_pairs, failed_barcode_fastq_pairs, failed_umi_fastq_pairs, merge_dir, basename, batch_size=25)

Initialise the MergeBarcodeFastqs task

Parameters:
  • fastq_groups (dict) – input groups of Fastq files (grouped by barcode)

  • unassigned_fastq_pairs (list) – Fastq R1/R2 pairs with reads not assigned to ICELL8 barcodes

  • failed_barcode_fastq_pairs (list) – Fastq R1/R2 pairs with reads failing barcode quality check

  • failed_umi_fastq_pairs (list) – Fastq R1/R2 pairs with reads failing UMI quality check

  • merge_dir (str) – destination directory to write output files to

  • basename (str) – basename to use for output FASTQ files

  • batch_size (int) – number of barcodes to group together into one command for merging (larger batches = fewer jobs, but each job takes longer) (default=25)

Outputs:

The returned object has the following properties:

  • fastqs: object with properties which point to iterators listing output Fastqs (see below)

  • patterns: object with properties which are glob-style patterns matching output Fastqs (see below)

The output Fastqs are:

  • assigned: Fastqs with reads assigned to known barcodes

  • unassigned: Fastqs with reads not assigned to known barcodes

  • failed_barcodes: Fastqs with reads which failed the barcode quality check

  • failed_umis: Fastqs with reads which failed the UMI quality check

For example:

  • output.pattern.assigned = glob pattern to match Fastqs with assigned reads

  • output.fastqs.unassigned = iterator listing Fastqs with unassigned reads

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.MergeSampleFastqs(_name, *args, **kws)

Given a set of Fastq files grouped by ICELL8 sample and arranged into R1/R2 file pairs, pool reads into new Fastq files according to the sample names.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_groups, merge_dir)

Initialise the MergeSampleFastqs task

Parameters:
  • fastq_groups (dict) – input groups of Fastq R1/R2 file pairs (grouped by sample)

  • merge_dir (str) – destination directory to write output files to

Outputs:

  • pattern: glob-style pattern matching output Fastq file names

  • fastqs: FileCollector listing output Fastq files

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.PairFastqs(_name, *args, **kws)

Arrange Fastqs into R1/R2 pairs

This is a utility task that can be used to arrange a list of Fastq files into R1/R2 pairs according to their contents. Essentially it is a wrapper for the ‘pair_fastqs’ function.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastqs)

Initialise the PairFastqs task

Parameters:

fastqs (list) – list of paths to the Fastq files to be paired

Outputs:
fastq_pairs: list of tuples with

R1/R2 Fastq pairs

unpaired: list of unpaired Fastqs

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.ReportProcessing(_name, *args, **kws)

Generate an HTML report on the processing

Runs the icell8_report.py script to generate the report.

init(dirn, stats_file=None, out_file=None, name=None)

Initialise the ReportProcessing task

Parameters:
  • dirn (str) – directory with the ICell8 pipeline outputs

  • stats_file (str) – name of stats file

  • out_file (str) – name of output report file (default: ‘icell8_processing.html’)

  • name (str) – title of report

Outputs:
report_html: path to the output HTML

report

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.SetupDirectories(_name, *args, **kws)

Create directories

Given a list of directories, check for and if necessary create each one

init(dirs)

Initialise the SetupDirectories task

Parameters:

dirs (list) – list of directories to ensure are present

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.SplitAndFilterFastqPair(*args, **kws)

Build command to run the ‘split_icell8_fastqs.py’ utility

cmd()

Build the command

Must be implemented by the subclass and return a Command instance

init(fastq_pair, out_dir, well_list=None, basename=None, mode='none', discard_unknown_barcodes=False, quality_filter=False, compress=False)

Create a new SplitAndFilterFastqPair instance

Parameters:
  • fastq_pair (list) – R1/R2 FASTQ file pair

  • out_dir (str) – destination directory to write output files to

  • well_list (str) – ‘well list’ file to use (optional)

  • basename (str) – basename to use for output FASTQ files (optional)

  • mode (str) – mode to run the utility in

  • discard_unknown_barcodes (bool) – if True then discard read pairs where the barcode doesn’t match one of those in the well list file (nb well list file must also be supplied in this case) (all reads are kept by default)

  • quality_filter (bool) – if True then also do filtering based on barcode- and UMI-quality (no filtering is performed by default)

  • compress (bool) – if True then gzip the output files (FASTQs are uncompressed by default)

class auto_process_ngs.icell8.pipeline.SplitByBarcodes(_name, *args, **kws)

Given a set of Fastq R1/R2 file pairs, group reads into new Fastq file pairs by ICell8 barcode.

Output Fastqs are named: <BASENAME>.<BARCODE>.r[1|2].fastq.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_pairs, barcodes_dir)

Initialise the SplitByBarcodes task

Parameters:
  • fastq_pairs (list) – input Fastq R1/R2 pairs

  • barcodes_dir (str) – destination directory to write output files to

Outputs:
pattern (str): glob-style pattern matching output

Fastq file names

fastqs (FileCollector): output Fastq files

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.SplitFastqsIntoBatches(_name, *args, **kws)

Split reads from Fastq pairs into batches

Divides reads from supplied Fastq R1/R2 file pairs and divides into new Fastq pairs consisting of “batches”, each with a specified number of read pairs.

The output Fastqs will be named <BASENAME>.B###.r[1|2].fastq (where ### is the batch number)

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_pairs, batch_dir, basename, batch_size=5000000)

Initialise the SplitFastqsIntoBatches task

Parameters:
  • fastq_pairs (list) – list of input Fastq R1/R2 file pairs

  • batch_dir (str) – destination directory to write output files to

  • basename (str) – basename for output Fastqs

  • batch_size (int) – number of reads per output FASTQ (in batch mode) (optional)

Outputs:
pattern: glob-style pattern matching output

Fastq file names

fastqs: FileCollector listing output Fastq

files

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.TrimFastqPair(*args, **kws)

Build command to run ‘cutadapt’ with ICell8 settings

cmd()

Build the command

Must be implemented by the subclass and return a Command instance

init(fastq_pair, trim_dir)

Create a new TrimFastqPair instance

Parameters:
  • fastq_pair (list) – R1/R1 FASTQ file pair

  • trim_dir (str) – destination directory to write output files to

class auto_process_ngs.icell8.pipeline.TrimReads(_name, *args, **kws)

Run ‘cutadapt’ with ICell8 settings

Given a set of Fastq R1/R2 file pairs, performs the following operations on the R2 reads:

  • Remove sequencing primers

  • Remove poly-A/T and poly-N sequences

  • Apply quality filter of Q <= 25

  • Remove short reads (<= 20 bases) post-trimming

If an R2 read fails any of the filters then the read pair is rejected.

Output Fastqs contain the filtered and trimmed reads only.

finish()

Perform actions on task completion

Performs any actions that are required on completion of the task, such as moving or copying data, and setting the values of any output parameters.

Must be implemented by the subclass

init(fastq_pairs, trim_dir)

Initialise the TrimReads task

Parameters:
  • fastq_pairs (list) – input Fastq R1/R2 pairs

  • trim_dir (str) – destination directory to write output files to

Outputs:
pattern (str): glob-style pattern matching output

Fastq file names

fastqs (FileCollector): output Fastq files

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

class auto_process_ngs.icell8.pipeline.UpdateProjectData(_name, *args, **kws)

Updates data (e.g. primary Fastq set) in a project

init(project_dir, primary_fastq_dir)

Initialise the SetPrimaryFastqDir task

Parameters:
  • project_dir (str) – path to the project directory

  • primary_fastq_dir (str) – name of the Fastq subdirectory to make the primary Fastq set

setup()

Set up commands to be performed by the task

Must be implemented by the subclass

auto_process_ngs.icell8.pipeline.convert_to_xlsx(tsv_file, xlsx_file, title=None, freeze_header=False)

Convert a tab-delimited file to an XLSX file

Parameters:
  • tsv_file (str) – path to the input TSV file

  • xlsx_file (str) – path to the output XLSX file

  • title (str) – optional, name to give the worksheet in the output XLSX file (defaults to the input file name)

  • freeze_header (bool) – optional, if True then ‘freezes’ the first line of the XLSX file (default is not to freeze the first line)

auto_process_ngs.icell8.pipeline.tmp_dir(d)

Create a temp dir for directory ‘d’