auto_process_ngs.icell8.atac

Utility functions for handling single-cell ATAC-seq data from the ICELL8 platform.

Functions:

  • report: write a timestamped message

  • reverse_complement: get reverse complement of a sequence

  • update_fastq_read_index: rewrite index sequence in Fastq read header

  • split_fastq: split Fastq into batches

  • assign_reads: assign reads to samples from batched ICELL8 ATAC Fastqs

  • concat_fastqs: concatenate Fastqs for a sample across batches

auto_process_ngs.icell8.atac.assign_reads(args)

Assign reads to samples from batched ICELL8 ATAC Fastqs

Intended to be invoked via ‘map’ or similar function

Arguments are supplied in a single list which should contain the following items:

  • R1 Fastq: path to R1 Fastq file

  • R2 Fastq: path to R2 Fastq file

  • I1 Fastq: path to I1 Fastq file

  • I2 Fastq: path to I2 Fastq file

  • well list: path to the well list file

  • mode: either ‘samples’ or ‘barcodes’

  • swap_i1_and_i2: boolean indicating whether I1 and I2 Fastqs should be swapped for matching

  • reverse_complement: either None, ‘i1’, ‘i2’ or both

  • rewrite_fastq_headers: boolean indicating whether to write the matching ICELL8 barcodes into the Fastq read headers on output

  • working_dir: working directory to write batches to

  • unassigned: ‘sample name’ to associate with unassigned

    read (used as a basename for output file)

In ‘samples’ mode assignment is done to samples only; in ‘barcodes’ mode assignment is done to samples and barcodes.

Parameters:

args (list) – list containing the arguments supplied to the read assigner

Returns:

tuple consisting of (batch id,barcode_counts,

unassigned_barcodes_file).

Return type:

Tuple

auto_process_ngs.icell8.atac.concat_fastqs(args)

Concatenate Fastqs for a sample across batches

Intended to be invoked via ‘map’ or similar function

Arguments are supplied in a single list which should contain the following items:

  • sample: name of sample to concatenate Fastqs for

  • index: integer index to assign to the sample in output file name

  • barcode: (optional) barcode to concatenate Fastqs for (set to None when concatenating across samples)

  • lane: (optional) lane number for output Fastq (set to None to stop lane number appearing)

  • read: read identifier e.g. ‘R1’ or ‘I2’

  • batches: list of batch IDs to concatenate across

  • working_dir: working directory where batches are located

  • final_dir: directory to write concatenated Fastq to

Parameters:

args (list) – list containing the arguments supplied to the read assigner

Returns:

path of concatenated Fastq.

Return type:

String

auto_process_ngs.icell8.atac.report(msg, fp=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)

Write timestamped message

Parameters:
  • msg (string) – text to be reported

  • fp (file) – stream to report to (defaults to stdout)

auto_process_ngs.icell8.atac.reverse_complement(s)

Return reverse complement of a sequence

Parameters:

s (str) – sequence to be reverse complemented

Returns:

reverse complement of input sequence

Return type:

String

auto_process_ngs.icell8.atac.split_fastq(args)

Split Fastq into batches

Intended to be invoked via ‘map’ or similar function

Arguments are supplied in a single list which should contain the following items:

  • Fastq: path to Fastq file to split

  • batch_size: size of each batch

  • working_dir: working directory to write batches to

Parameters:

args (list) – list containing the arguments supplied to the splitter

Returns:

list of batched Fastqs

Return type:

List

auto_process_ngs.icell8.atac.update_fastq_read_index(read, index_sequence)

Update the index sequence (aka barcode) in a Fastq read

Parameters:
  • read (list) – Fastq read to be updated, as a list of lines (with the first element/line being the sequence identifier line)

  • index_sequence (str) – the index sequence to put into the read header

Returns:

the updated Fastq read, as a list of lines.

Return type:

List