MolecularEvolution
Documentation for MolecularEvolution.
A Julia package for the flexible development of phylogenetic models.
MolecularEvolution.jl exploits Julia's multiple dispatch, implementing a fully generic suite of likelihood calculations, branchlength optimization, topology optimization, and ancestral inference. Users can construct trees using already-defined data types and models. But users can define probability distributions over their own data types, and specify the behavior of these under their own model types, and can mix and match different models on the same phylogeny.
If the behavior you need is not already available in MolecularEvolution.jl
:
- If you have a new data type:
- A
Partition
type that represents the uncertainty over your state. combine!()
that merges evidence from twoPartition
s.
- A
- If you have a new model:
- A
BranchModel
type that stores your model parameters. forward!()
that evolves state distributions over branches, in the root-to-tip direction.backward!()
that reverse-evolves state distributions over branches, in the tip-to-root direction.
- A
And then sampling, likelihood calculations, branch-length optimization, ancestral reconstruction, etc should be available for your new data or model.
Design principles
In order of importance, we aim for the following:
- Flexibility and generality
- Where possible, we avoid design decisions that limit the development of new models, or make it harder to develop new models.
- We do not sacrifice flexibility for performance.
- Scalability
- Analyses implemented using
MolecularEvolution.jl
should scale to large, real-world datasets.
- Analyses implemented using
- Performance
- While the above take precedence over speed, it should be possible to optimize your
Partition
,combine!()
,BranchModel
,forward!()
andbackward!()
functions to obtain competative runtimes.
- While the above take precedence over speed, it should be possible to optimize your
Authors:
Venkatesh Kumar and Ben Murrell, with additional contributions by Sanjay Mohan, Alec Pankow, Hassan Sadiq, and Kenta Sato.
Quick example: Likelihood calculations under phylogenetic Brownian motion:
using MolecularEvolution, Plots
#First simulate a tree, using a coalescent process
tree = sim_tree(n=200)
internal_message_init!(tree, GaussianPartition())
#Simulate brownian motion over the tree
bm_model = BrownianMotion(0.0,1.0)
sample_down!(tree, bm_model)
#And plot the log likelihood as a function of the parameter value
ll(x) = log_likelihood!(tree,BrownianMotion(0.0,x))
plot(0.7:0.001:1.6,ll, xlabel = "variance per unit time", ylabel = "log likelihood")
MolecularEvolution.BranchlengthSampler
MolecularEvolution.LazyDown
MolecularEvolution.LazyPartition
MolecularEvolution.LazyUp
Base.:==
MolecularEvolution.SWM_prob_grid
MolecularEvolution._mapreduce
MolecularEvolution.backward!
MolecularEvolution.backward!
MolecularEvolution.bfs_mapreduce
MolecularEvolution.branchlength_optim!
MolecularEvolution.branchlength_optim!
MolecularEvolution.brents_method_minimize
MolecularEvolution.cascading_max_state_dict
MolecularEvolution.cascading_max_state_dict
MolecularEvolution.char_proportions
MolecularEvolution.collect_leaf_dists
MolecularEvolution.colored_seq_draw
MolecularEvolution.combine!
MolecularEvolution.combine!
MolecularEvolution.copy_tree
MolecularEvolution.deepequals
MolecularEvolution.dfs_mapreduce
MolecularEvolution.discrete_name_color_dict
MolecularEvolution.draw_example_tree
MolecularEvolution.endpoint_conditioned_sample_state_dict
MolecularEvolution.endpoint_conditioned_sample_state_dict
MolecularEvolution.expected_subs_per_site
MolecularEvolution.felsenstein!
MolecularEvolution.felsenstein_down!
MolecularEvolution.forward!
MolecularEvolution.forward!
MolecularEvolution.gappy_Q_from_symmetric_rate_matrix
MolecularEvolution.get_highlighter_legend
MolecularEvolution.get_max_depth
MolecularEvolution.get_phylo_tree
MolecularEvolution.get_phylo_tree
MolecularEvolution.golden_section_maximize
MolecularEvolution.highlight_seq_draw
MolecularEvolution.highlighter_tree_draw
MolecularEvolution.internal_message_init!
MolecularEvolution.internal_message_init!
MolecularEvolution.istreeconsistent
MolecularEvolution.lazyprep!
MolecularEvolution.lazysort!
MolecularEvolution.leaf_distmat
MolecularEvolution.linear_scale
MolecularEvolution.log_likelihood
MolecularEvolution.log_likelihood!
MolecularEvolution.longest_path
MolecularEvolution.marginal_state_dict
MolecularEvolution.marginal_state_dict
MolecularEvolution.matrix_for_display
MolecularEvolution.metropolis_sample
MolecularEvolution.midpoint
MolecularEvolution.mix
MolecularEvolution.name2node_dict
MolecularEvolution.newick
MolecularEvolution.newick
MolecularEvolution.nni_optim!
MolecularEvolution.nni_optim!
MolecularEvolution.node_distances
MolecularEvolution.nonreversibleQ
MolecularEvolution.parent_list
MolecularEvolution.partition2obs
MolecularEvolution.partition2obs
MolecularEvolution.plot_multiple_trees
MolecularEvolution.plot_multiple_trees
MolecularEvolution.populate_tree!
MolecularEvolution.populate_tree!
MolecularEvolution.promote_internal
MolecularEvolution.quadratic_CI
MolecularEvolution.quadratic_CI
MolecularEvolution.read_fasta
MolecularEvolution.read_fasta
MolecularEvolution.read_newick_tree
MolecularEvolution.read_newick_tree
MolecularEvolution.reversibleQ
MolecularEvolution.reversibleQ
MolecularEvolution.root2tip_distances
MolecularEvolution.sample_down!
MolecularEvolution.sample_down!
MolecularEvolution.sample_from_message!
MolecularEvolution.savefig_tweakSVG
MolecularEvolution.savefig_tweakSVG
MolecularEvolution.savefig_tweakSVG
MolecularEvolution.shortest_path_between_nodes
MolecularEvolution.sibling_inds
MolecularEvolution.siblings
MolecularEvolution.sim_tree
MolecularEvolution.sim_tree
MolecularEvolution.sim_tree
MolecularEvolution.simple_radial_tree_plot
MolecularEvolution.simple_tree_draw
MolecularEvolution.total_LL
MolecularEvolution.tree2distances
MolecularEvolution.tree2shared_branch_lengths
MolecularEvolution.tree_draw
MolecularEvolution.tree_draw
MolecularEvolution.tree_polish!
MolecularEvolution.tree_polish!
MolecularEvolution.unc2probvec
MolecularEvolution.unc2probvec
MolecularEvolution.univariate_maximize
MolecularEvolution.univariate_maximize
MolecularEvolution.univariate_sampler
MolecularEvolution.values_from_phylo_tree
MolecularEvolution.values_from_phylo_tree
MolecularEvolution.weightEM
MolecularEvolution.write_fasta
MolecularEvolution.write_fasta
MolecularEvolution.write_nexus
MolecularEvolution.write_nexus
MolecularEvolution.BranchlengthSampler
— TypeBranchlengthSampler
A type that allows you to specify a additive proposal function in the log domain and a prior distrubution over the log of the branchlengths. It also holds the acceptance ratio acc_ratio (acc_ratio[1] stores the number of accepts, and acc_ratio[2] stores the number of rejects).
MolecularEvolution.LazyDown
— TypeConstructors
LazyDown(stores_obs)
LazyDown() = LazyDown(x::FelNode -> true)
Description
Indicate that we want to do a downward pass, e.g. sample_down!
. The function passed to the constructor takes a node::FelNode
as input and returns a Bool
that decides if node
stores its observations.
MolecularEvolution.LazyPartition
— TypeConstructor
LazyPartition{PType}()
Initialize an empty LazyPartition
that is meant for wrapping a partition of type PType
.
Description
With this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:
log_likelihood!
felsenstein!
sample_down!
For successive felsenstein!
calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL
or site_LLs
.
Further requirements
Suppose you want to wrap a partition of PType
with LazyPartition
:
- If you're calling
log_likelihood!
andfelsenstein!
:obs2partition!(partition::PType, obs)
that transforms an observation to a partition.
- If you're calling
sample_down!
:partition2obs(partition::PType)
that returns the most likely state from a partition, invertsobs2partition!
.
MolecularEvolution.LazyUp
— TypeConstructor
LazyUp()
Description
Indicate that we want to do an upward pass, e.g. felsenstein!
.
Base.:==
— Method==(t1, t2)
Defaults to pointer equality
MolecularEvolution.SWM_prob_grid
— MethodSWM_prob_grid(part::SWMPartition{PType}) where {PType <: MultiSitePartition}
Returns a matrix of probabilities for each site, for each model (in the probability domain - not logged!) as well as the log probability offsets
MolecularEvolution._mapreduce
— MethodInternal function. Helper for bfsmapreduce and dfsmapreduce
MolecularEvolution.backward!
— Methodbackward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)
Propagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.
MolecularEvolution.bfs_mapreduce
— MethodPerforms a BFS map-reduce over the tree, starting at a given node For each node, mapreduce is called as: mapreduce(currnode::FelNode, prevnode::FelNode, aggregator) where prev_node is the previous node visited on the path from the start node to the current node It is expected to update the aggregator, and not return anything.
Not exactly conventional map-reduce, as map-reduce calls may rely on state in the aggregator added by map-reduce calls on other nodes visited earlier.
MolecularEvolution.branchlength_optim!
— Methodbranchlength_optim!(tree::FelNode, models; <keyword arguments>)
Uses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.
Keyword Arguments
partition_list=nothing
: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models, the default option).tol=1e-5
: absolute tolerance for thebl_modifier
.bl_modifier=GoldenSectionOpt()
: can either be a optimizer or a sampler (subtype of UnivariateModifier). For optimization, in addition to golden section search, Brent's method can be used by setting bl_modifier=BrentsMethodOpt().sort_tree=false
: determines if alazysort!
will be performed, which can reduce the amount of temporary messages that has to be initialized.traversal=Iterators.reverse
: a function that determines the traversal, permutes an iterable.shuffle=false
: do a randomly shuffled traversal, overridestraversal
.
MolecularEvolution.brents_method_minimize
— Methodbrents_method_minimize(f, a::Real, b::Real, transform, t::Real; ε::Real=sqrt(eps()))
Brent's method for minimization.
Given a function f with a single local minimum in the interval (a,b), Brent's method returns an approximation of the x-value that minimizes f to an accuaracy between 2tol and 3tol, where tol is a combination of a relative and an absolute tolerance, tol := ε|x| + t. ε should be no smaller 2*eps
, and preferably not much less than sqrt(eps)
, which is also the default value. eps is defined here as the machine epsilon in double precision. t should be positive.
The method combines the stability of a Golden Section Search and the superlinear convergence Successive Parabolic Interpolation has under certain conditions. The method never converges much slower than a Fibonacci search and for a sufficiently well-behaved f, convergence can be exptected to be superlinear, with an order that's usually atleast 1.3247...
Examples
julia> f(x) = exp(-x) - cos(x)
f (generic function with 1 method)
julia> m = brents_method_minimize(f, -1, 2, identity, 1e-7)
0.5885327257940255
From: Richard P. Brent, "Algorithms for Minimization without Derivatives" (1973). Chapter 5.
MolecularEvolution.cascading_max_state_dict
— Methodcascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
MolecularEvolution.char_proportions
— Methodchar_proportions(seqs, alphabet::String)
Takes a vector of sequences and returns a vector of the proportion of each character across all sequences. An example alphabet
argument is MolecularEvolution.AAstring
.
MolecularEvolution.collect_leaf_dists
— Methodcollect_leaf_dists(trees::Vector{<:AbstractTreeNode})
Returns a list of distance matrices containing the distance between the leaf nodes, which can be used to assess mixing.
MolecularEvolution.colored_seq_draw
— Methodcolored_seq_draw(x, y, str::AbstractString; color_dict=Dict(), font_size=8pt, posx=hcenter, posy=vcenter)
Draw an arbitrary sequence. color_dict
gives a mapping from characters to colors (default black). Default options for nucleotide colorings and amino acid colorings are given in the constants NUC_COLORS
and AA_COLORS
. This can be used along with compose_dict
for drawing sequences at nodes in a tree (see tree_draw
). Returns a Compose container.
MolecularEvolution.combine!
— Methodcombine!(dest::P, src::P) where P<:Partition
Combines evidence from two partitions of the same type, storing the result in dest. Note: You should overload this for your own Partititon types.
MolecularEvolution.copy_tree
— Functionfunction copy_tree(root::FelNode, shallow_copy=false)
Returns an untangled copy of the tree. Optionally, the flag `shallow_copy` can be used to obtain a copy of the tree with only the names and branchlengths.
MolecularEvolution.deepequals
— Methoddeepequals(t1, t2)
Checks whether two trees are equal by recursively calling this on all fields, except :parent
, in order to prevent cycles. In order to ensure that the :parent
field is not hiding something different on both trees, ensure that each is consistent first (see: istreeconsistent
).
MolecularEvolution.dfs_mapreduce
— MethodPerforms a DFS map-reduce over the tree, starting at a given node See bfs_mapreduce for more details.
MolecularEvolution.discrete_name_color_dict
— Methoddiscrete_name_color_dict(newt::AbstractTreeNode,tag_func; rainbow = false, scramble = false, darken = true, col_seed = nothing)
Takes a tree and a tag_func, which converts the leaf label into a category (ie. there should be <20 of these), and returns a color dictionary that can be used to color the leaves or bubbles.
Example tagfunc: function tagfunc(nam::String) return split(nam,"_")[1] end
For prettier colors, but less discrimination: rainbow = true To randomize the rainbow color assignment: scramble = true col_seed is currently set to white, and excluded from the list of colors, to make them more visible.
Consider making your own version of this function to customize colors as you see fit.
Example use: numleaves = 50 Nefunc(t) = 1*(e^-t).+5.0 newt = simtree(numleaves,Nefunc,1.0,nstart = rand(1:numleaves)); newt = ladderize(newt) tagfunc(nam) = mod(sum(Int.(collect(nam))),7) dic = discretenamecolordict(newt,tagfunc,rainbow = true); treedraw(newt,linewidth = 0.5mm,labelcolor_dict = dic)
MolecularEvolution.draw_example_tree
— Methoddraw_example_tree(num_leaves = 50)
Draws a tree and shows the code that draws it.
MolecularEvolution.endpoint_conditioned_sample_state_dict
— Methodendpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
MolecularEvolution.expected_subs_per_site
— Methodexpected_subs_per_site(Q,mu)
Takes a rate matrix Q and an equilibrium frequency vector, and calculates the expected number of substitutions per site.
MolecularEvolution.felsenstein!
— Methodfelsenstein!(node::FelNode, models; partition_list = nothing)
Should usually be called on the root of the tree. Propagates Felsenstein pass up from the tips to the root. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
MolecularEvolution.felsenstein_down!
— Methodfelsenstein_down!(node::FelNode, models; partition_list = 1:length(tree.message), temp_message = copy_message(tree.message))
Should usually be called on the root of the tree. Propagates Felsenstein pass down from the root to the tips. felsenstein!() should usually be called first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
MolecularEvolution.forward!
— Methodforward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)
Propagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.
MolecularEvolution.gappy_Q_from_symmetric_rate_matrix
— Methodgappy_Q_from_symmetric_rate_matrix(sym_mat, gap_rate, eq_freqs)
Takes a symmetric rate matrix and gap rate (governing mutations to and from gaps) and returns a gappy rate matrix. The equilibrium frequencies are multiplied on column-wise.
MolecularEvolution.get_highlighter_legend
— Methodget_highlighter_legend(legend_colors)
Returns a Compose object given an input dictionary or pairs mapping characters to colors.
MolecularEvolution.get_max_depth
— Methodget_max_depth(node,depth::Real)
Return the maximum depth of all children starting from the indicated node.
MolecularEvolution.get_phylo_tree
— Methodget_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))
Converts a FelNode tree to a Phylo tree. The data_function
should return a list of tuples of the form (key, value) to be added to the Phylo tree data
Dictionary. Any key/value pairs on the FelNode node_data
Dict will also be added to the Phylo tree.
MolecularEvolution.golden_section_maximize
— MethodGolden section search.
Given a function f with a single local minimum in the interval [a,b], gss returns a subset interval [c,d] that contains the minimum with d-c <= tol.
Examples
julia> f(x) = -(x-2)^2
f (generic function with 1 method)
julia> m = golden_section_maximize(f, 1, 5, identity, 1e-10)
2.0000000000051843
From: https://en.wikipedia.org/wiki/Golden-section_search
MolecularEvolution.highlight_seq_draw
— Methodhighlight_seq_draw(x, y, str::AbstractString, region, basecolor, hicolor; fontsize=8pt, posx=hcenter, posy=vcenter)
Draw a sequence, highlighting the sites given in region
. This can be used along with compose_dict
for drawing sequences at nodes in a tree (see tree_draw
). Returns a Compose container.
MolecularEvolution.highlighter_tree_draw
— Methodhighlighter_tree_draw(tree, ali_seqs, seqnames, master;
highlighter_start = 1.1, highlighter_width = 1,
coord_width = highlighter_start + highlighter_width + 0.1,
scale_length = nothing, major_breaks = 1000, minor_breaks = 500,
tree_args = NamedTuple[], legend_padding = 0.5cm, legend_colors = NUC_colors)
Draws a combined tree and highlighter plot. The vector of seqnames must match the node names in tree
.
kwargs:
- treeargs: kwargs to pass to `treedraw()`
- legendcolors: Mapping of characters to highlighter colors (default NTcolors)
- scale_length: Length of the scale bar
- highlighter_start: Canvas start for the highlighter panel
- highlighter_width: Canvas width for the highlighter panel
- coord_width: Total width of the canvas
- major_breaks: Numbered breaks for sequence axis
- minor_breaks: Ticks for sequence axis
MolecularEvolution.internal_message_init!
— Methodinternal_message_init!(tree::FelNode, partition::Partition)
Initializes the message template for each node in the tree, as an array of the partition.
MolecularEvolution.internal_message_init!
— Methodinternal_message_init!(tree::FelNode, empty_message::Vector{<:Partition})
Initializes the message template for each node in the tree, allocating space for each partition.
MolecularEvolution.istreeconsistent
— Methodistreeconsistent(root)
Checks whether the :parent
field is set to be consistent with the :child
field for all nodes in the subtree.
MolecularEvolution.lazyprep!
— Methodlazyprep!(tree::FelNode, initial_message::Vector{<:Partition}; partition_list = 1:length(tree.message), direction::LazyDirection = LazyUp())
Extra, intermediate step of tree preparations between initializing messages across the tree and calling message passing algorithms with LazyPartition
.
- Perform a
lazysort!
on tree to obtain the optimal tree for a lazyfelsenstein!
prop, or asample_down!
. - Fix
tree.parent_message
to an initial message. - Preallocate sufficiently many inner partitions needed for a
felsenstein!
prop, or asample_down!
. - Specialized preparations based on the direction of the operations (
forward!
,backward!
).LazyDown
orLazyUp
.
See also LazyDown
, LazyUp
.
MolecularEvolution.lazysort!
— Method- Should be run on a tree containing LazyPartitions before running
felsenstein!
. Sorts for a minimal count of active partitions during a felsenstein! - Returns the minimum length of memoryblocks (-1) required for a
felsenstein!
prop. We need a temporary memoryblock duringbackward!
, hence the '-1'.
Since felsenstein! uses a stack, we want to avoid having long node.children[1].children[1]... chains
MolecularEvolution.leaf_distmat
— Methodleaf_distmat(tree)
Returns a matrix of the distances between the leaf nodes where the index on the columns and rows are sorted by the leaf names.
MolecularEvolution.linear_scale
— Methodlinear_scale(val,in_min,in_max,out_min,out_max)
Linearly maps val which lives in [inmin,inmax] to a value in [outmin,outmax]
MolecularEvolution.log_likelihood!
— Methodlog_likelihood!(tree::FelNode, models; partition_list = nothing)
First re-computes the upward felsenstein pass, and then computes the log likelihood of this tree. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
MolecularEvolution.log_likelihood
— Methodlog_likelihood(tree::FelNode, models; partition_list = nothing)
Computed the log likelihood of this tree. Requires felsenstein!() to have been run. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
MolecularEvolution.longest_path
— MethodReturns the longest path in a tree For convenience, this is returned as two lists of form: [leafnode, parentnode, .... root] Where the leaf_node nodes are selected to be the furthest away
MolecularEvolution.marginal_state_dict
— Methodmarginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
MolecularEvolution.matrix_for_display
— Methodmatrix_for_display(Q,labels)
Takes a numerical matrix and a vector of labels, and returns a typically mixed type matrix with the numerical values and the labels. This is to easily visualize rate matrices in eg. the REPL.
MolecularEvolution.metropolis_sample
— Methodfunction metropolis_sample(
initial_tree::FelNode,
models::Vector{<:BranchModel},
num_of_samples;
bl_modifier::UnivariateSampler = BranchlengthSampler(Normal(0,2), Normal(-1,1))
burn_in=1000,
sample_interval=10,
collect_LLs = false,
midpoint_rooting=false,
)
Samples tree topologies from a posterior distribution.
Arguments
initial_tree
: An initial tree topology with the leaves populated with data, for the likelihood calculation.models
: A list of branch models.num_of_samples
: The number of tree samples drawn from the posterior.bl_sampler
: Sampler used to drawn branchlengths from the posterior.burn_in
: The number of samples discarded at the start of the Markov Chain.sample_interval
: The distance between samples in the underlying Markov Chain (to reduce sample correlation).collect_LLs
: Specifies if the function should return the log-likelihoods of the trees.midpoint_rooting
: Specifies whether the drawn samples should be midpoint rerooted (Important! Should only be used for time-reversible branch models starting in equilibrium).
The leaves of the initial tree should be populated with data and felsenstein! should be called on the initial tree before calling this function.
Returns
samples
: The trees drawn from the posterior. Returns shallow tree copies, which needs to be repopulated before running felsenstein! etc.sample_LLs
: The associated log-likelihoods of the tree (optional).
MolecularEvolution.midpoint
— MethodReturns a midpoint as a node and a distance above it where the midpoint is
MolecularEvolution.mix
— Methodmix(swm_part::SWMPartition{PType} ) where {PType <: MultiSitePartition}
mix
collapses a Site-Wise Mixture partition to a single component partition, weighted by the site-wise likelihoods for each component, and the init weights. Specifically, it takes a SWMPartition{Ptype}
and returns a PType
. You'll need to have this implemented for certain helper functionality if you're playing with new kinds of SWMPartitions that aren't mixtures of DiscretePartitions
.
MolecularEvolution.name2node_dict
— Methodname2node_dict(root)
Returns a dictionary of leaf nodes, indexed by node.name. Can be used to associate sequences with leaf nodes.
MolecularEvolution.newick
— Methodnewick(root)
Returns a newick string representation of the tree.
MolecularEvolution.nni_optim!
— Methodnni_optim!(tree::FelNode, models; <keyword arguments>)
Considers local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.
Keyword Arguments
partition_list=nothing
: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models, the default option).selection_rule = x -> argmax(x)
: a function that takes the current and proposed log likelihoods and selects a nni configuration. Note that the current log likelihood is stored at x[1].sort_tree=false
: determines if alazysort!
will be performed, which can reduce the amount of temporary messages that has to be initialized.traversal=Iterators.reverse
: a function that determines the traversal, permutes an iterable.shuffle=false
: do a randomly shuffled traversal, overridestraversal
.
MolecularEvolution.node_distances
— MethodCompute the distance to all other nodes from a given node
MolecularEvolution.nonreversibleQ
— MethodnonreversibleQ(param_vec)
Takes a vector of parameters and returns a nonreversible rate matrix.
MolecularEvolution.parent_list
— MethodProvides a list of parent nodes nodes from this node up to the root node
MolecularEvolution.partition2obs
— Methodpartition2obs(part::Partition)
Extracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.
MolecularEvolution.plot_multiple_trees
— Methodplot_multiple_trees(trees, inf_tree; <keyword arguments>)
Plots multiple phylogenetic trees against a reference tree, inf_tree
. For each tree in trees
, a linear Weighted Least Squares (WLS) problem (parameterized by the weight_fn
keyword) is solved for the x-positions of the matching nodes between inf_tree
and tree.
Keyword Arguments
node_size=4
: the size of the nodes in the plot.line_width=0.5
: the width of the branches fromtrees
.font_size=10
: the font size for the leaf labels.margin=1.5
: the margin between a leaf node and its label.line_alpha=0.05
: the transparency level of the branches fromtrees
.y_jitter=0.0
: the standard deviation of the noise in the y-coordinate.weight_fn=n::FelNode -> ifelse(isroot(n), 1.0, 0.0))
: a function that assigns a weight to a node for the WLS problem.opt_scale=true
: whether to include a scaling parameter for the WLS problem.
MolecularEvolution.populate_tree!
— Methodpopulate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1, leaf_name_transform = x -> x)
Takes a tree, and a starting_message
(which will serve as the memory template for populating messages all over the tree). starting_message
can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition
is implemented for your Partition type, the leaf nodes will be populated with the data from data
, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names
, then if
tolerate_missing = 0
, an error will be throwntolerate_missing = 1
, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)tolerate_missing = 2
, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)
A renaming function that can eg. strip tags from the tree when matching leaf names with names
can be passed to leaf_name_transform
MolecularEvolution.promote_internal
— Methodpromote_internal(tree::FelNode)
Creates a new tree similar to the given tree, but with 'dummy' leaf nodes (w/ zero branchlength) representing each internal node (for drawing / evenly spacing labels internal nodes).
MolecularEvolution.quadratic_CI
— Methodquadratic_CI(f::Function,opt_params::Vector, param_ind::Int; rate_conf_level = 0.99, nudge_amount = 0.01)
Takes a NEGATIVE log likelihood function (compatible with Optim.jl), a vector of maximizing parameters, an a parameter index. Returns the quadratic confidence interval.
MolecularEvolution.quadratic_CI
— Methodquadratic_CI(xvec,yvec; rate_conf_level = 0.99)
Takes xvec, a vector of parameter values, and yvec, a vector of log likelihood evaluations (note: NOT the negative LLs you) might use with Optim.jl. Returns the confidence intervals computed by a quadratic approximation to the LL.
MolecularEvolution.read_fasta
— Methodread_fasta(filepath::String)
Reads in a fasta file and returns a tuple of (seqnames, seqs).
MolecularEvolution.read_newick_tree
— Methodreadnewicktree(treefile)
Reads in a tree from a file, of type FelNode
MolecularEvolution.reversibleQ
— MethodreversibleQ(param_vec,eq_freqs)
Takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.
MolecularEvolution.root2tip_distances
— Methodroot2tips(root::AbstractTreeNode)
Returns a vector of root-to-tip distances, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.
MolecularEvolution.sample_down!
— Methodsampledown!(root::FelNode,models,partitionlist)
Generates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
MolecularEvolution.sample_from_message!
— Methodsample_from_message!(message::Vector{<:Partition})
#Replaces an uncertain message with a sample from the distribution represented by each partition.
MolecularEvolution.savefig_tweakSVG
— Methodsavefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)
Saves a figure created using the Compose
approach, but tweaks the SVG after export.
eg. savefig_tweakSVG("export.svg",pl)
MolecularEvolution.savefig_tweakSVG
— Methodsavefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)
Note: Might only work if you're using the GR backend!! Saves a figure created using the Phylo
Plots
recipe, but tweaks the SVG after export. new_viewbox
needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4]
but this lets you add shifts, in case the plot is getting cut off.
eg. savefig_tweakSVG("export.svg",pl, new_viewbox = [-100, -100, 3000, 4500])
MolecularEvolution.shortest_path_between_nodes
— MethodShortest path between nodes, returned as two lists, each starting with one of the two nodes, and ending with the common ancestor
MolecularEvolution.sibling_inds
— Methodsibling_inds(node)
Returns logical indices of the siblings in the parent's child's vector.
MolecularEvolution.siblings
— Methodsiblings(node)
Returns a vector of siblings of node.
MolecularEvolution.sim_tree
— Methodsim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)
Simulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.
Nefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))
MolecularEvolution.sim_tree
— Methodsim_tree(;n = 10)
Simulates tree with constant population size.
MolecularEvolution.simple_radial_tree_plot
— Methodsimple_radial_tree_plot(root::FelNode; canvas_width = 10cm, line_color = "black", line_width = 0.1mm)
Draws a radial tree. No frills. No labels. Canvas height is automatically determined to avoid distorting the tree.
newt = betternewickimport("((A:1,B:1,C:1,D:1,E:1,F:1,G:1):1,(H:1,I:1):1);", FelNode{Float64}); simpleradialtreeplot(newt,linewidth = 0.5mm,root_angle = 7/10)
MolecularEvolution.simple_tree_draw
— Methodimg = simpletreedraw(tree::FelNode; canvaswidth = 15cm, canvasheight = 15cm, linecolor = "black", linewidth = 0.1mm)
A line drawing of a tree with very few options.
img = simple_tree_draw(tree)
img |> SVG("imgout.svg",10cm, 10cm)
OR
using Cairo
img |> PDF("imgout.pdf",10cm, 10cm)
MolecularEvolution.total_LL
— Methodtotal_LL(p::Partition)
If called on the root, it returns the log likelihood associated with that partition. Can be overloaded for complex partitions without straightforward site log likelihoods.
MolecularEvolution.tree2distances
— Methodtree2distances(root::AbstractTreeNode)
Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.
MolecularEvolution.tree2shared_branch_lengths
— Methodtree2distances(root::AbstractTreeNode)
Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.
MolecularEvolution.tree_draw
— Methodtree_draw(tree::FelNode;
canvas_width = 15cm, canvas_height = 15cm,
stretch_for_labels = 2.0, draw_labels = true,
line_width = 0.1mm, font_size = 4pt,
min_dot_size = 0.00, max_dot_size = 0.01,
line_opacity = 1.0,
dot_opacity = 1.0,
name_opacity = 1.0,
horizontal = true,
dot_size_dict = Dict(), dot_size_default = 0.0,
dot_color_dict = Dict(), dot_color_default = "black",
line_color_dict = Dict(), line_color_default = "black",
label_color_dict = Dict(), label_color_default = "black",
nodelabel_dict = Dict(),compose_dict = Dict()
)
Draws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict
must be a FelNode->function(x,y)
dictionary that returns a compose()
struct.
Example using compose_dict
str_tree = "(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0"
newt = gettreefromnewick(str_tree, FelNode)
ladderize!(newt)
compose_dict = Dict()
for n in getleaflist(newt)
#Replace the rand(4) with the frequencies you actually want.
compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)
end
tree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)
img = tree_draw(tree)
img |> SVG("imgout.svg",10cm, 10cm)
OR
using Cairo
img |> PDF("imgout.pdf",10cm, 10cm)
MolecularEvolution.tree_polish!
— Methodtree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)
Takes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0
to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.
MolecularEvolution.unc2probvec
— Methodunc2probvec(v)
Takes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.
MolecularEvolution.univariate_maximize
— Methodunivariate_maximize(f, a::Real, b::Real, transform, optimizer::BrentsMethodOpt, t::Real; ε::Real=sqrt(eps))
Maximizes f(x)
using Brent's method. See ?brents_method_minimize
.
MolecularEvolution.univariate_maximize
— Methodunivariate_maximize(f, a::Real, b::Real, transform, optimizer::GoldenSectionOpt, tol::Real)
Maximizes f(x)
using a Golden Section Search. See ?golden_section_maximize
.
Examples
julia> f(x) = -(x-2)^2
f (generic function with 1 method)
julia> m = univariate_maximize(f, 1, 5, identity, GoldenSectionOpt(), 1e-10)
2.0000000000051843
MolecularEvolution.univariate_sampler
— Methodunivariate_sampler(LL, modifier::BranchlengthPeturbation, curr_branchlength)
A MCMC algorithm that draws the next sample of a Markov Chain that approximates the Posterior distrubution over the branchlengths.
MolecularEvolution.values_from_phylo_tree
— Methodvalues_from_phylo_tree(phylo_tree, key)
Returns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.
MolecularEvolution.weightEM
— MethodweightEM(con_lik_matrix::Array{Float64,2}, θ; conc = 0.0, iters = 500)
Takes a conditional likelihood matrix (#categories-by-sites) and a starting frequency vector θ (length(θ) = #categories) and optimizes θ (using Expectation Maximization. Maybe.). If conc > 0 then this gives something like variational bayes behavior for LDA. Maybe.
MolecularEvolution.write_fasta
— Methodwrite_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)
Writes a fasta file from a vector of sequences, with optional seq_names.
MolecularEvolution.write_nexus
— Methodwrite_nexus(fname::String,tree::FelNode)
Writes the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data
dictionary will be converted into annotations. Only tested for simple node_data
formats and types.