

Alignment Functions

# NextGenSeqUtils.nw_align — Function.

nw_align(s1::String, s2::String; edge_reduction = 0.99)

Returns aligned strings using the Needleman-Wunch Algorithm (quadratic), with end gaps penalized slightly less. edge_reduction is a multiplier (usually less than one) on gaps on end of strings.

source

nw_align(s1::String, s2::String, banded::Float64)

Wrapper for nw_align and banded_nw_align. A larger banded value makes alignment slower but more accurate.

source

# NextGenSeqUtils.banded_nw_align — Function.

banded_nw_align(s1::String, s2::String; edge_reduction = 0.99, band_coeff = 1)

Like nw_align, but sub quadratic by only computing values within a band around the center diagonal. One 'band' of radius 3 = (4,1), (3,1), (2,1), (1,1), (1,2), (1,3), (1,4), aka upside-down L shape. band_coeff = 1 is sufficient to get same alignments as nw_align for 10% diverged sequences ~97% of the time; increase this value for more conservative alignment with longer computation time. Radius of band = bandwidth = band_coeff * sqrt(avg seq length)

source

# NextGenSeqUtils.triplet_nw_align — Function.

triplet_nw_align(s1::String, s2::String; edge_reduction = 0.99, boundary_mult = 2)

Returns alignment of two sequences where s1 is a reference with reading frame to be preserved and s2 is a query sequence. boundary_mult adjusts penalties for gaps preserving the reading frame of s1. This usually works best on range 0 to 3, higher values for more strongly enforced gaps aligned on reference frame (divisible-by-3 indices)

source

# NextGenSeqUtils.local_align — Function.

local_align(ref::String, query::String; mismatch_score = -1, 
            match_score = 1, gap_penalty = -1, 
            rightaligned=true, refend = false)

Aligns a query sequence locally to a reference. If true, rightaligned keeps the right ends of each sequence in final alignment- otherwise they are trimmed; refend keeps the beginning/left end of ref. If you want to keep both ends of both strings, use nw_align. For best alignments use the default score values.

source

# NextGenSeqUtils.kmer_seeded_align — Function.

kmer_seeded_align(s1::String, s2::String;
                  wordlength = 30,
                  skip = 10,
                  aligncodons = false,
                  banded = 1.0,
                  debug::Bool = false)

Returns aligned strings, where alignment is first done with larger word matches and then (possibly banded) Needleman-Wunsch on intermediate intervals. skip gives a necessary gap between searched-for words in s1. For best results, use the default wordlength and skip values. See nw_align for explanation of banded.

source

# NextGenSeqUtils.triplet_kmer_seeded_align — Function.

triplet_kmer_seeded_align(s1::String, s2::String;
                          wordlength = 30,
                          skip = 9,
                          boundary_mult = 2,
                          alignedcodons = true,
                          debug::Bool=false)

Returns aligned strings, where alignment is first done with word matches and then Needleman-Wunsch on intermediate intervals, prefering to preserve the reading frame of the first arg s1. skip gives a necessary gap between searched-for words in s1. For best results, use the default wordlength and skip values. See triplet_nw_align for explanation of boundary_mult.

source

# NextGenSeqUtils.loc_kmer_seeded_align — Function.

function local_kmer_seeded_align(s1::String, s2::String;
                                 wordlength = 30,
                                 skip = 10,
                                 trimpadding = 100,
                                 debug::Bool=false)

Returns locally aligned strings, where alignment is first done with word matches and then Needleman-Wunsch on intermediate intervals.

s1 is a reference to align to, and s2 is a query to extract a local match from. s2 may be trimmed or expanded with gaps. Before locally aligning ends of sequences, the ends of s2 are trimmed to length trimpadding for faster alignment. Increasing this will possible increase alignment accuracy but effect runtime. skip gives a necessary gap between searched-for words in s1. For best results, use the default wordlength and skip values.

source

# NextGenSeqUtils.local_kmer_seeded_align — Function.

function local_kmer_seeded_align(s1::String, s2::String;
                                 wordlength = 30,
                                 skip = 10,
                                 trimpadding = 100,
                                 debug::Bool=false)

Returns locally aligned strings, where alignment is first done with word matches and then Needleman-Wunsch on intermediate intervals.

s1 is a reference to align to, and s2 is a query to extract a local match from. s2 may be trimmed or expanded with gaps. Before locally aligning ends of sequences, the ends of s2 are trimmed to length trimpadding for faster alignment. Increasing this will possible increase alignment accuracy but effect runtime. skip gives a necessary gap between searched-for words in s1. For best results, use the default wordlength and skip values.

source

# NextGenSeqUtils.kmer_seeded_edit_dist — Function.

kmer_seeded_edit_dist(s1::String , s2::String;
                      wordlength = 30,
                      skip = 5,
                      aa_matches = false)

Computes levenshtein edit distance with speedups from only computing the dp scoring matrix between word matches. If aa_matches = true, will attempt to find amino acid matches in any reference frame, and add the nucleotide Hamming distance of these matches to Levenshtein distances of mismatches. skip gives a necessary gap between searched-for words in s1. For best results, use the default wordlength and skip values.

source

# NextGenSeqUtils.resolve_alignments — Function.

resolve_alignments(ref::String, query::String; mode = 1)

Called on aligned strings. Resolves query with respect to ref. mode = 1 for resolving single indels, mode = 2 for resolving single indels and codon insertions in query.

source

# NextGenSeqUtils.align_reference_frames — Function.

align_reading_frames(clusters; k = 6, thresh = 0.03, verbose = false)

Takes clusters = [consensus_sequences, cluster_sizes], chooses references out of consensuses that do not have stop codons in the middle, and makes all consensus sequence reading frames agree. Returns resolved consensus seqs (goods) along with filtered out consensus seqs that are >thresh divergent from nearest reference (bads). k = kmer size for computing kmer vectors of sequences.

source

# NextGenSeqUtils.local_edit_dist — Function.

local_edit_dist(s1::String, s2::String)

Returns the edit distance between two sequences after local alignment

source