API Reference
This section provides detailed documentation for all public types and functions in SeededAlignment.jl
.
Core Types
These types define the fundamental structures used in the alignment algorithms:
Move
MoveSet
ScoreScheme
LongDNA{4}
Alignment Functions
These functions perform the actual sequence alignment operations:
Utilities
Helper functions for working with alignment data:
Standard Parameters
Pre-configured settings for common alignment scenarios:
std_codon_scoring
std_codon_moveset
pairwise_noisy_moveset
Function Index
A complete alphabetical listing of all documented functions.
SeededAlignment.Move
SeededAlignment.MoveSet
SeededAlignment.ScoreScheme
SeededAlignment.clean_alignment_readingframe
SeededAlignment.msa_codon_align
SeededAlignment.nw_align
SeededAlignment.pairwise_noisy_moveset
SeededAlignment.read_fasta
SeededAlignment.seed_chain_align
SeededAlignment.std_codon_moveset
SeededAlignment.std_codon_scoring
SeededAlignment.write_fasta
SeededAlignment.Move
— TypeMove
Represents a single allowed alignment step between two sequences in the alignment matrix.
Fields
step::Int
: How many aligned units (nucleotides) this move advances by.score::Float64
: Cost or score associated with performing this move.
Reference to the top sequence (horizontal axis)
horizontal_stride::Int
: How far this move advances along the top (reference) sequence.horizontal_phase::Int
: Frame offset in the top sequence; helps track codon boundaries.
Reference to the bottom sequence (vertical axis)
vertical_stride::Int
: How far this move advances along the bottom (query) sequence.vertical_phase::Int
: Frame offset in the bottom sequence.extensionAble::Bool
: Whether this move can be extended (e.g., for affine gap penalties).
Description
Moves define basic operations used in dynamic programming alignment: matches, mismatches, and gaps. Moves can preserve codon reading frames by keeping strides in multiples of 3 and matching phase positions.
Two helper constructors are provided:
Move(step, score, stride, phase, extensionAble)
assumes reference (vertical) stride and phaseMove(step, score, extensionAble)
for simple moves with unit stride and zero phase.
SeededAlignment.MoveSet
— TypeMoveSet
Defines the full set of allowed alignment moves used during pairwise or multiple sequence alignment.
Fields
match_moves::Vector{Move}
: Moves that represent matches or substitutions between nucleotides.hor_moves::Vector{Move}
: Moves that introduce gaps in the reference (horizontal) sequence.vert_moves::Vector{Move}
: Moves that introduce gaps in the query (vertical) sequence.
Description
A MoveSet
groups the allowable Move
s into categories used during dynamic programming alignment. Each type controls how the algorithm can transition between states, including nucleotide-level match and gap moves with customizable reading frame behavior.
Used by alignment algorithms such as seed_chain_align
and msa_codon_align
to control the scoring and allowed operations during alignment.
SeededAlignment.ScoreScheme
— TypeScoreScheme(; match_score=0.0, mismatch_score=0.5,extension_score=0.1,edge_ext_begin=true,edge_ext_end=true,kmerlength=21)
A struct for storing scoring parameters used for sequence alignment. Uses a keyword constructor.
Fields
match_score::Float64
: Score (typically ≤ 0) awarded for matching nucleotide. Lower is better.mismatch_score::Float64
: Penalty for nucleotide mismatches. Higher values penalize substitutions more strongly.extension_score::Float64
: Cost to extend a gap (indel). Affects how gaps are penalized during alignment.edge_ext_begin::Bool
: Iftrue
, allows gaps to be extended at the beginning of sequences.edge_ext_end::Bool
: Iftrue
, allows gaps to be extended at the end of sequences.kmerlength::Int64
: Length of k-mers used for seeding alignments (if applicable). Ignored if no seeding is used.
Description
ScoreScheme
defines how matches, mismatches, and gaps are scored during nucleotide-level sequence alignment.
This struct is typically passed to functions like seed_chain_align
or msa_codon_align
.
example
score_params = ScoreScheme(extension_score = 0.3, mismatch_score = 0.7) # (everything else will be keept at default values)
SeededAlignment.clean_alignment_readingframe
— Methodclean_alignment_readingframe(aligned_ref::LongDNA{4},aligned_seq::LongDNA{4})
Takes a pairwise alignment of a reference (with known reading frame) and a sequence, and removes single indels which
don't respect the reference's reading frame. To clean a full codon alignment you need multiple pairwise alignments and can clean up
single indels by broadcasting with "."
# NOTE We assume the readingFrame is 0 mod 3 with sequences 0 indexed
SeededAlignment.msa_codon_align
— Functionmsa_codon_align(ref::LongDNA{4}, seqs::Vector{LongDNA{4}}, moveset::MoveSet, score_params::ScoreScheme)
Produces a fast codon aligment with consistent readingFrame.
Arguments
ref::LongDNA{4}
: The reference DNA sequence, assumed to have an readingFrame which starts at its first nucleotide.seqs::Vector{LongDNA{4}}
: A vector of DNA sequences to align against the reference.moveset::MoveSet
: Defines the allowable alignment operations (e.g. match nucleotide, single indel, readingFrame respecting triple indel). Assumes moveset is choosen so some moves respect the readingFrame.score_params::ScoreScheme
: Scoring parameters for codon-aware alignment, including match/mismatch and gap penalties.
Returns
Vector{LongDNA{4}}
: A vector of aligned sequences (including the reference), with codon-aware gaps that preserve reading frame consistency.
Description
Produces a fast multiple codon alignment based on a reference with known readingFrame, a moveset and a scoreScheme. This is done by computing a pairwise alignment with respect to the reference for each sequence and then cleaning up single indel noise.
Example
ref = LongDNA{4}("ATGACGTGA") # Reference with known reading frame
seqs = [LongDNA{4}("ATGTCGTGA"), LongDNA{4}("ATGACGAGA")]
moveset = std_codon_moveset()
score_params = std_codon_scoring()
alignment = msa_codon_align(ref, seqs, moveset, score_params)
NOTE: The sequences (ungapped) might not be preserved by the alignment by the cleanup step. Some nucleotides might be inserted or removed.
SeededAlignment.nw_align
— Methodnw_align(A::LongDNA{4},B::LongDNA{4},moveset::MoveSet,scoreScheme::ScoreScheme)
Takes two ungapped LongDNA{4} sequences and computes an optimal pairwise alignment
with respect to the moveset and the scoreScheme.
SeededAlignment.pairwise_noisy_moveset
— Methodpairwise_noisy_moveset()
SeededAlignment.read_fasta
— Methodread_fasta(filepath::String)
Reads in a fasta file and returns a tuple of (seqnames, seqs).
SeededAlignment.seed_chain_align
— Methodseed_chain_align(A::LongDNA{4}, B::LongDNA{4}, moveset::MoveSet, scoreScheme::ScoreScheme)
Perform pairwise alignment between two DNA sequences using a seeding strategy.
Arguments
A::LongDNA{4}
: The first DNA sequence to align.B::LongDNA{4}
: The second DNA sequence to align.moveset::MoveSet
: Defines allowed alignment moves (e.g., match, mismatch, gap).scoreScheme::ScoreScheme
: Scoring scheme for e.g. matches, mismatches, and gaps.
Returns
Vector{LongDNA{4}}
: A vector the two aligned sequences.
Description
This function performs a fast (sub-quadratic) pairwise alignment using seeding and chaining. It identifies high-scoring seed matches between A
and B
, chains them to build a candidate alignment skeleton, and fills gaps between seeds using dynamic programming guided by moveset
and scoreScheme
.
Example
A = LongDNA{4}("ACGTACGT")
B = LongDNA{4}("ACGTTGCA")
moveset = std_codon_moveset()
scoreScheme = std_codon_scoring()
alignment = seed_chain_align(A, B, moveset, scoreScheme)
SeededAlignment.std_codon_moveset
— Methodstd_codon_moveset()
Return a standard codon-aware MoveSet
for sequence alignment.
Returns
MoveSet
: Contains codon-aware match and gap moves.
Moves
- Match:
Move(1, 0.0)
- Horizontal (gaps in reference sequence):
Move(1, 2.0, 1, 0, 1, 0, false)
Move(3, 2.0, 1, 0, 3, 0, true)
- Vertical (gaps in query sequence):
Move(1, 2.0, 1, 0, 1, 0, false)
Move(3, 2.0, 1, 0, 3, 0, true)
Use this as a default MoveSet
for codon-preserving alignments.
SeededAlignment.std_codon_scoring
— Methodstd_codon_scoring()
Return a standard codon-aware ScoreScheme
for alignment.
Returns
ScoreScheme
: Default scoring parameters for codon alignment.
Parameters
- Match score:
0.0
- Mismatch score:
0.3
- Gap extension score:
0.1
- Allow extension at sequence ends:
true
(both ends) - K-mer length for seeding:
21
Use this as a default ScoreScheme
for codon-preserving alignments.
SeededAlignment.write_fasta
— Methodwrite_fasta(filepath::String, sequences::Vector{LongDNA{4}}; seq_names = nothing)
Writes a fasta file from a vector of sequences, with optional seq_names.