Models
We offer a catalogue of frequently used models that are already integrated with the framework and ready to be used. We maintain that if a model you'd like to use is not included in the list, you can swiftly define one yourself and leverage our framework nonetheless.
Discrete state models
Here we account for a typical set-up for a discrete state Partition
:
MolecularEvolution.DiscretePartition Type
abstract type DiscretePartition <: MultiSitePartition end
Represents a states
-by-sites
matrix of probabilities. The following fields are loosely required:
state
: A matrix of probabilities that are site-wise normalized.states
: The number of states.sites
: The number of sites.scaling
: A vector of log-domain probability scaling factors, one per site.
And here's a list of simple concrete subtypes of DiscretePartition
:
And then there are two typical BranchModel
s that will cooperate with this Partition
:
Note
The two above can be regarded as special cases of the more general PModel
, which just represents a P matrix.
A typical way of constructing your Q matrix in our ecosystem is by:
MolecularEvolution.reversibleQ Function
reversibleQ(param_vec,eq_freqs)
Takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.
sourceMolecularEvolution.nonreversibleQ Function
nonreversibleQ(param_vec)
Takes a vector of parameters and returns a nonreversible rate matrix.
sourceCodon models
MolecularEvolution.CodonPartition Type
mutable struct CodonPartition <: DiscretePartition
Constructors
CodonPartition(sites; code = universal_code)
CodonPartition(state, states, sites, scaling; code = universal_code)
Description
A DiscretePartition
where every state represents a codon. Can be customized to use different genetic codes.
We offer constructors for the following Q matrix parameterizations:
MolecularEvolution.MG94_F3x4
- for example usage, see Example 3: FUBARMolecularEvolution.MG94_F61
MolecularEvolution.HB98_F61
MolecularEvolution.HB98_AAfit
Use help mode, ?
, in the REPL to find out more about the above.
Miscellaneous models
MolecularEvolution.InterpolatedDiscreteModel Type
Constructors
InterpolatedDiscreteModel(siz::Int64, generator, tvec::Vector{Float64})
InterpolatedDiscreteModel(Pvec::Array{Float64,3}, tvec::Vector{Float64})
generator
is a function that takes a time value t
and returns a P matrix.
Description
Stores a number (siz
) of P matrices, and the time values to which they correspond. For a requested t, the returned P matrix is (element-wise linearly) interpolated between it's two neighbours.
MolecularEvolution.PiQ Type
Constructors
PiQ(r::Float64,pi::Vector{Float64}; normalize=false)
PiQ(pi::Vector{Float64}; normalize=false)
Description
The F81 substitution model, but for general dimensions. https://www.diva-portal.org/smash/get/diva2:1878793/FULLTEXT01.pdf
sourceContinuous models
The partition of interest is:
MolecularEvolution.GaussianPartition Type
Constructors
GaussianPartition(mean::Float64, var::Float64, norm_const::Float64)
GaussianPartition(mean::Float64, var::Float64) # norm_const defaults to 0.0
GaussianPartition() # mean, var, norm_const default to 0.0, 1.0, 0.0 respectively
Description
A partition representing a (not necessarily normalized) Gaussian distribution. norm_const
is the log-domain normalization constant.
And then there are two BranchModel
s which are compatible with the above partition, namely:
MolecularEvolution.BrownianMotion Type
BrownianMotion(mean_drift::Float64, var_drift::Float64)
A 1D continuous Brownian motion model with mean drift and variance drift.
sourceMolecularEvolution.ZeroDriftOrnsteinUhlenbeck Type
function ZeroDriftOrnsteinUhlenbeck(
mean::Float64,
volatility::Float64,
reversion::Float64,
)
A 1D continuous Ornstein-Uhlenbeck process with mean, volatility, and reversion.
sourceCompound models
Branch-wise mixture
MolecularEvolution.BWMModel Type
mutable struct BWMModel{M} <: DiscreteStateModel where M <: DiscreteStateModel
Fields
models::Vector{<:M}
: A vector of models.weights::Vector{Float64}
: A vector of weights.
Description
Branch-wise mixture model.
Note
forward!
and backward!
are currently only defined for M<:PMatrixModel
.
CAT
MolecularEvolution.CATModel Type
CATModel(models::Vector{<:BranchModel})
CAT is something where you split the sites up, and assign each site to a different model (whose "data" gets stored in a contiguous block of memory).
sourceMolecularEvolution.CATPartition Type
Constructors
CATPartition(part_inds::Vector{Vector{Int}})
CATPartition(part_inds::Vector{Vector{Int}}, parts::Vector{PType})
Description
A partition for the CATModel
.
Covarion
MolecularEvolution.CovarionModel Type
Constructors
CovarionModel(models::Vector{<:DiscreteStateModel}, inter_transition_rates::Matrix{Float64})
CovarionModel(models::Vector{<:DiscreteStateModel}, inter_transition_rate::Float64)
Description
The covarion model.
sourceMolecularEvolution.CovarionPartition Type
CovarionPartition(states,sites,models,t)
A partition for the CovarionModel
.
Site-wise mixture
See Example 2: GTR+Gamma for example usage.
MolecularEvolution.SWMModel Type
Constructors
SWMModel(models::Vector{<:BranchModel})
SWMModel(model::M, rs::Vector{Float64}) where {M <: BranchModel}
Description
A site-wise mixture model, for site-to-site "random effects" rate variation.
sourceMolecularEvolution.SWMPartition Type
Constructors
SWMPartition(parts::Vector{PType}) where {PType <: MultiSitePartition}
SWMPartition(part::PType, n_parts::Int) where {PType <: MultiSitePartition}
SWMPartition(parts::Vector{PType}, weights::Vector{Float64}, sites::Int, states::Int, models::Int) where {PType <: MultiSitePartition}
Description
A site-wise mixture partition for the SWMModel
.
Lazy models
LazyPartition
MolecularEvolution.LazyPartition Type
Constructor
LazyPartition{PType}()
Initialize an empty LazyPartition
that is meant for wrapping a partition of type PType
.
Description
With this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:
log_likelihood!
felsenstein!
sample_down!
Note
For successive felsenstein!
calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL
or site_LLs
.
Further requirements
Suppose you want to wrap a partition of PType
with LazyPartition
:
If you're calling
log_likelihood!
andfelsenstein!
:obs2partition!(partition::PType, obs)
that transforms an observation to a partition.
If you're calling
sample_down!
:partition2obs(partition::PType)
that returns the most likely state from a partition, invertsobs2partition!
.
Examples
Example 1: Initializing for an upward pass
Now, we show how to wrap the CodonPartition
s from Example 3: FUBAR with LazyPartition
:
You simply go from initializing messages like this:
initial_partition = CodonPartition(Int64(length(seqs[1])/3))
initial_partition.state .= eq_freqs
populate_tree!(tree,initial_partition,seqnames,seqs)
To this
initial_partition = CodonPartition(Int64(length(seqs[1])/3))
initial_partition.state .= eq_freqs
lazy_initial_partition = LazyPartition{CodonPartition}()
populate_tree!(tree,lazy_initial_partition,seqnames,seqs)
lazyprep!(tree, initial_partition)
By this slight modification, we go from initializing and using 554 partitions to 6 during the subsequent log_likelihood!
and felsenstein!
calls. There is no significant decrease in performance recorded from this switch.
Example 2: Initializing for a downward pass
Now, we show how to wrap the GaussianPartition
s from Quick example: Likelihood calculations under phylogenetic Brownian motion: with LazyPartition
:
You simply go from initializing messages like this:
internal_message_init!(tree, GaussianPartition())
To this (technically we only add 1 LOC)
initial_partition = GaussianPartition()
lazy_initial_partition = LazyPartition{GaussianPartition}()
internal_message_init!(tree, lazy_initial_partition)
lazyprep!(tree, initial_partition, direction=LazyDown(isleafnode))
Note
Now, we provided a direction for lazyprep!
. The direction is an instance of LazyDown
, which was initialized with the isleafnode
function. The function isleafnode
dictates if a node saves its sampled observation after a down pass. If you use direction=LazyDown()
, every node saves its observation.
Surrounding LazyPartition
MolecularEvolution.lazyprep! Function
lazyprep!(tree::FelNode, initial_message::Vector{<:Partition}; partition_list = 1:length(tree.message), direction::LazyDirection = LazyUp())
Extra, intermediate step of tree preparations between initializing messages across the tree and calling message passing algorithms with LazyPartition
.
Perform a
lazysort!
on tree to obtain the optimal tree for a lazyfelsenstein!
prop, or asample_down!
.Fix
tree.parent_message
to an initial message.Preallocate sufficiently many inner partitions needed for a
felsenstein!
prop, or asample_down!
.Specialized preparations based on the direction of the operations (
forward!
,backward!
).LazyDown
orLazyUp
.
See also LazyDown
, LazyUp
.
MolecularEvolution.LazyUp Type
Constructor
LazyUp()
Description
Indicate that we want to do an upward pass, e.g. felsenstein!
.
MolecularEvolution.LazyDown Type
Constructors
LazyDown(stores_obs)
LazyDown() = LazyDown(x::FelNode -> true)
Description
Indicate that we want to do a downward pass, e.g. sample_down!
. The function passed to the constructor takes a node::FelNode
as input and returns a Bool
that decides if node
stores its observations.