API Reference

This section provides detailed documentation for all public types and functions in SeededAlignment.jl.

Core Types

These types define the fundamental structures used in the alignment algorithms:

Alignment Functions

These functions perform the actual sequence alignment operations:

Utilities

Helper functions for working with alignment data:

Standard Parameters

Pre-configured settings for common alignment scenarios:

Function Index

A complete alphabetical listing of all documented functions.

SeededAlignment.MoveType
Move

Represents a single allowed alignment step between two sequences in the alignment matrix.

Fields

  • step::Int: How many aligned units (nucleotides) this move advances by.
  • score::Float64: Cost or score associated with performing this move.

Reference to the top sequence (horizontal axis)

  • horizontal_stride::Int: How far this move advances along the top (reference) sequence.
  • horizontal_phase::Int: Frame offset in the top sequence; helps track codon boundaries.

Reference to the bottom sequence (vertical axis)

  • vertical_stride::Int: How far this move advances along the bottom (query) sequence.

  • vertical_phase::Int: Frame offset in the bottom sequence.

  • extensionAble::Bool: Whether this move can be extended (e.g., for affine gap penalties).

Description

Moves define basic operations used in dynamic programming alignment: matches, mismatches, and gaps. Moves can preserve codon reading frames by keeping strides in multiples of 3 and matching phase positions.

Two helper constructors are provided:

  • Move(step, score, stride, phase, extensionAble) assumes reference (vertical) stride and phase
  • Move(step, score, extensionAble) for simple moves with unit stride and zero phase.
source
SeededAlignment.MoveSetType
MoveSet

Defines the full set of allowed alignment moves used during pairwise or multiple sequence alignment.

Fields

  • match_moves::Vector{Move}: Moves that represent matches or substitutions between nucleotides.
  • hor_moves::Vector{Move}: Moves that introduce gaps in the reference (horizontal) sequence.
  • vert_moves::Vector{Move}: Moves that introduce gaps in the query (vertical) sequence.

Description

A MoveSet groups the allowable Moves into categories used during dynamic programming alignment. Each type controls how the algorithm can transition between states, including nucleotide-level match and gap moves with customizable reading frame behavior.

Used by alignment algorithms such as seed_chain_align and msa_codon_align to control the scoring and allowed operations during alignment.

source
SeededAlignment.ScoreSchemeType
ScoreScheme(; match_score=0.0, mismatch_score=0.5,extension_score=0.1,edge_ext_begin=true,edge_ext_end=true,kmerlength=21)

A struct for storing scoring parameters used for sequence alignment. Uses a keyword constructor.

Fields

  • match_score::Float64: Score (typically ≤ 0) awarded for matching nucleotide. Lower is better.

  • mismatch_score::Float64: Penalty for nucleotide mismatches. Higher values penalize substitutions more strongly.

  • extension_score::Float64: Cost to extend a gap (indel). Affects how gaps are penalized during alignment.

  • edge_ext_begin::Bool: If true, allows gaps to be extended at the beginning of sequences.

  • edge_ext_end::Bool: If true, allows gaps to be extended at the end of sequences.

  • kmerlength::Int64: Length of k-mers used for seeding alignments (if applicable). Ignored if no seeding is used.

Description

ScoreScheme defines how matches, mismatches, and gaps are scored during nucleotide-level sequence alignment.

This struct is typically passed to functions like seed_chain_align or msa_codon_align.

example

score_params = ScoreScheme(extension_score = 0.3, mismatch_score = 0.7) # (everything else will be keept at default values)
source
SeededAlignment.clean_alignment_readingframeMethod
clean_alignment_readingframe(aligned_ref::LongDNA{4},aligned_seq::LongDNA{4})

Takes a pairwise alignment of a reference (with known reading frame) and a sequence, and removes single indels which
don't respect the reference's reading frame. To clean a full codon alignment you need multiple pairwise alignments and can clean up 
single indels by broadcasting with "." 

# NOTE We assume the readingFrame is 0 mod 3 with sequences 0 indexed
source
SeededAlignment.msa_codon_alignFunction
msa_codon_align(ref::LongDNA{4}, seqs::Vector{LongDNA{4}}, moveset::MoveSet, score_params::ScoreScheme)

Produces a fast codon aligment with consistent readingFrame.

Arguments

  • ref::LongDNA{4}: The reference DNA sequence, assumed to have an readingFrame which starts at its first nucleotide.
  • seqs::Vector{LongDNA{4}}: A vector of DNA sequences to align against the reference.
  • moveset::MoveSet: Defines the allowable alignment operations (e.g. match nucleotide, single indel, readingFrame respecting triple indel). Assumes moveset is choosen so some moves respect the readingFrame.
  • score_params::ScoreScheme: Scoring parameters for codon-aware alignment, including match/mismatch and gap penalties.

Returns

  • Vector{LongDNA{4}}: A vector of aligned sequences (including the reference), with codon-aware gaps that preserve reading frame consistency.

Description

Produces a fast multiple codon alignment based on a reference with known readingFrame, a moveset and a scoreScheme. This is done by computing a pairwise alignment with respect to the reference for each sequence and then cleaning up single indel noise.

Example

ref = LongDNA{4}("ATGACGTGA")  # Reference with known reading frame
seqs = [LongDNA{4}("ATGTCGTGA"), LongDNA{4}("ATGACGAGA")]
moveset = std_codon_moveset()
score_params = std_codon_scoring()

alignment = msa_codon_align(ref, seqs, moveset, score_params)

NOTE: The sequences (ungapped) might not be preserved by the alignment by the cleanup step. Some nucleotides might be inserted or removed.

source
SeededAlignment.nw_alignMethod
nw_align(A::LongDNA{4},B::LongDNA{4},moveset::MoveSet,scoreScheme::ScoreScheme)

Takes two ungapped LongDNA{4} sequences and computes an optimal pairwise alignment 
with respect to the moveset and the scoreScheme.
source
SeededAlignment.seed_chain_alignMethod
seed_chain_align(A::LongDNA{4}, B::LongDNA{4}, moveset::MoveSet, scoreScheme::ScoreScheme)

Perform pairwise alignment between two DNA sequences using a seeding strategy.

Arguments

  • A::LongDNA{4}: The first DNA sequence to align.
  • B::LongDNA{4}: The second DNA sequence to align.
  • moveset::MoveSet: Defines allowed alignment moves (e.g., match, mismatch, gap).
  • scoreScheme::ScoreScheme: Scoring scheme for e.g. matches, mismatches, and gaps.

Returns

  • Vector{LongDNA{4}}: A vector the two aligned sequences.

Description

This function performs a fast (sub-quadratic) pairwise alignment using seeding and chaining. It identifies high-scoring seed matches between A and B, chains them to build a candidate alignment skeleton, and fills gaps between seeds using dynamic programming guided by moveset and scoreScheme.

Example

A = LongDNA{4}("ACGTACGT")
B = LongDNA{4}("ACGTTGCA")
moveset = std_codon_moveset()
scoreScheme = std_codon_scoring()

alignment = seed_chain_align(A, B, moveset, scoreScheme)
source
SeededAlignment.std_codon_movesetMethod
std_codon_moveset()

Return a standard codon-aware MoveSet for sequence alignment.

Returns

  • MoveSet: Contains codon-aware match and gap moves.

Moves

  • Match: Move(1, 0.0)
  • Horizontal (gaps in reference sequence):
    • Move(1, 2.0, 1, 0, 1, 0, false)
    • Move(3, 2.0, 1, 0, 3, 0, true)
  • Vertical (gaps in query sequence):
    • Move(1, 2.0, 1, 0, 1, 0, false)
    • Move(3, 2.0, 1, 0, 3, 0, true)

Use this as a default MoveSet for codon-preserving alignments.

source
SeededAlignment.std_codon_scoringMethod
std_codon_scoring()

Return a standard codon-aware ScoreScheme for alignment.

Returns

  • ScoreScheme: Default scoring parameters for codon alignment.

Parameters

  • Match score: 0.0
  • Mismatch score: 0.3
  • Gap extension score: 0.1
  • Allow extension at sequence ends: true (both ends)
  • K-mer length for seeding: 21

Use this as a default ScoreScheme for codon-preserving alignments.

source
SeededAlignment.write_fastaMethod
write_fasta(filepath::String, sequences::Vector{LongDNA{4}}; seq_names = nothing)

Writes a fasta file from a vector of sequences, with optional seq_names.

source