MolecularEvolution

Documentation for MolecularEvolution.

A Julia package for the flexible development of phylogenetic models.

MolecularEvolution.jl exploits Julia's multiple dispatch, implementing a fully generic suite of likelihood calculations, branchlength optimization, topology optimization, and ancestral inference. Users can construct trees using already-defined data types and models. But users can define probability distributions over their own data types, and specify the behavior of these under their own model types, and can mix and match different models on the same phylogeny.

If the behavior you need is not already available in MolecularEvolution.jl:

  • If you have a new data type:
    • A Partition type that represents the uncertainty over your state.
    • combine!() that merges evidence from two Partitions.
  • If you have a new model:
    • A BranchModel type that stores your model parameters.
    • forward!() that evolves state distributions over branches, in the root-to-tip direction.
    • backward!() that reverse-evolves state distributions over branches, in the tip-to-root direction.

And then sampling, likelihood calculations, branch-length optimization, ancestral reconstruction, etc should be available for your new data or model.

Design principles

In order of importance, we aim for the following:

  • Flexibility and generality
    • Where possible, we avoid design decisions that limit the development of new models, or make it harder to develop new models.
    • We do not sacrifice flexibility for performance.
  • Scalability
    • Analyses implemented using MolecularEvolution.jl should scale to large, real-world datasets.
  • Performance
    • While the above take precedence over speed, it should be possible to optimize your Partition, combine!(), BranchModel, forward!() and backward!() functions to obtain competative runtimes.

Authors:

Venkatesh Kumar and Ben Murrell, with additional contributions by Sanjay Mohan, Alec Pankow, Hassan Sadiq, and Kenta Sato.

Quick example: Likelihood calculations under phylogenetic Brownian motion:

using MolecularEvolution, Plots

#First simulate a tree, using a coalescent process
tree = sim_tree(n=200)
internal_message_init!(tree, GaussianPartition())
#Simulate brownian motion over the tree
bm_model = BrownianMotion(0.0,1.0)
sample_down!(tree, bm_model)
#And plot the log likelihood as a function of the parameter value
ll(x) = log_likelihood!(tree,BrownianMotion(0.0,x))
plot(0.7:0.001:1.6,ll, xlabel = "variance per unit time", ylabel = "log likelihood")

MolecularEvolution.BranchlengthSamplerType
BranchlengthSampler

A type that allows you to specify a additive proposal function in the log domain and a prior distrubution over the log of the branchlengths. It also holds the acceptance ratio acc_ratio (acc_ratio[1] stores the number of accepts, and acc_ratio[2] stores the number of rejects).
source
MolecularEvolution.LazyDownType

Constructors

LazyDown(stores_obs)
LazyDown() = LazyDown(x::FelNode -> true)

Description

Indicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.

source
MolecularEvolution.LazyPartitionType

Constructor

LazyPartition{PType}()

Initialize an empty LazyPartition that is meant for wrapping a partition of type PType.

Description

With this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:

  • log_likelihood!
  • felsenstein!
  • sample_down!
Note

For successive felsenstein! calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL or site_LLs.

Further requirements

Suppose you want to wrap a partition of PType with LazyPartition:

  • If you're calling log_likelihood! and felsenstein!:
    • obs2partition!(partition::PType, obs) that transforms an observation to a partition.
  • If you're calling sample_down!:
    • partition2obs(partition::PType) that returns the most likely state from a partition, inverts obs2partition!.
source
MolecularEvolution.SWM_prob_gridMethod
SWM_prob_grid(part::SWMPartition{PType}) where {PType <: MultiSitePartition}

Returns a matrix of probabilities for each site, for each model (in the probability domain - not logged!) as well as the log probability offsets

source
MolecularEvolution.backward!Method
backward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)

Propagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.

source
MolecularEvolution.bfs_mapreduceMethod

Performs a BFS map-reduce over the tree, starting at a given node For each node, mapreduce is called as: mapreduce(currnode::FelNode, prevnode::FelNode, aggregator) where prev_node is the previous node visited on the path from the start node to the current node It is expected to update the aggregator, and not return anything.

Not exactly conventional map-reduce, as map-reduce calls may rely on state in the aggregator added by map-reduce calls on other nodes visited earlier.

source
MolecularEvolution.branchlength_optim!Method
branchlength_optim!(tree::FelNode, models;  <keyword arguments>)

Uses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.

Keyword Arguments

  • partition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models, the default option).
  • tol=1e-5: absolute tolerance for the bl_modifier.
  • bl_modifier=GoldenSectionOpt(): can either be a optimizer or a sampler (subtype of UnivariateModifier). For optimization, in addition to golden section search, Brent's method can be used by setting bl_modifier=BrentsMethodOpt().
  • sort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.
  • traversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.
  • shuffle=false: do a randomly shuffled traversal, overrides traversal.
source
MolecularEvolution.brents_method_minimizeMethod
brents_method_minimize(f, a::Real, b::Real, transform, t::Real; ε::Real=sqrt(eps()))

Brent's method for minimization.

Given a function f with a single local minimum in the interval (a,b), Brent's method returns an approximation of the x-value that minimizes f to an accuaracy between 2tol and 3tol, where tol is a combination of a relative and an absolute tolerance, tol := ε|x| + t. ε should be no smaller 2*eps, and preferably not much less than sqrt(eps), which is also the default value. eps is defined here as the machine epsilon in double precision. t should be positive.

The method combines the stability of a Golden Section Search and the superlinear convergence Successive Parabolic Interpolation has under certain conditions. The method never converges much slower than a Fibonacci search and for a sufficiently well-behaved f, convergence can be exptected to be superlinear, with an order that's usually atleast 1.3247...

Examples

julia> f(x) = exp(-x) - cos(x)
f (generic function with 1 method)

julia> m = brents_method_minimize(f, -1, 2, identity, 1e-7)
0.5885327257940255

From: Richard P. Brent, "Algorithms for Minimization without Derivatives" (1973). Chapter 5.

source
MolecularEvolution.cascading_max_state_dictMethod
cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.char_proportionsMethod
char_proportions(seqs, alphabet::String)

Takes a vector of sequences and returns a vector of the proportion of each character across all sequences. An example alphabet argument is MolecularEvolution.AAstring.

source
MolecularEvolution.collect_leaf_distsMethod
collect_leaf_dists(trees::Vector{<:AbstractTreeNode})

Returns a list of distance matrices containing the distance between the leaf nodes, which can be used to assess mixing.
source
MolecularEvolution.colored_seq_drawMethod
colored_seq_draw(x, y, str::AbstractString; color_dict=Dict(), font_size=8pt, posx=hcenter, posy=vcenter)

Draw an arbitrary sequence. color_dict gives a mapping from characters to colors (default black). Default options for nucleotide colorings and amino acid colorings are given in the constants NUC_COLORS and AA_COLORS. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.

source
MolecularEvolution.combine!Method
combine!(dest::P, src::P) where P<:Partition

Combines evidence from two partitions of the same type, storing the result in dest. Note: You should overload this for your own Partititon types.

source
MolecularEvolution.copy_treeFunction
function copy_tree(root::FelNode, shallow_copy=false)

Returns an untangled copy of the tree. Optionally, the flag `shallow_copy` can be used to obtain a copy of the tree with only the names and branchlengths.
source
MolecularEvolution.deepequalsMethod
deepequals(t1, t2)

Checks whether two trees are equal by recursively calling this on all fields, except :parent, in order to prevent cycles. In order to ensure that the :parent field is not hiding something different on both trees, ensure that each is consistent first (see: istreeconsistent).

source
MolecularEvolution.discrete_name_color_dictMethod
discrete_name_color_dict(newt::AbstractTreeNode,tag_func; rainbow = false, scramble = false, darken = true, col_seed = nothing)

Takes a tree and a tag_func, which converts the leaf label into a category (ie. there should be <20 of these), and returns a color dictionary that can be used to color the leaves or bubbles.

Example tagfunc: function tagfunc(nam::String) return split(nam,"_")[1] end

For prettier colors, but less discrimination: rainbow = true To randomize the rainbow color assignment: scramble = true col_seed is currently set to white, and excluded from the list of colors, to make them more visible.

Consider making your own version of this function to customize colors as you see fit.

Example use: numleaves = 50 Nefunc(t) = 1*(e^-t).+5.0 newt = simtree(numleaves,Nefunc,1.0,nstart = rand(1:numleaves)); newt = ladderize(newt) tagfunc(nam) = mod(sum(Int.(collect(nam))),7) dic = discretenamecolordict(newt,tagfunc,rainbow = true); treedraw(newt,linewidth = 0.5mm,labelcolor_dict = dic)

source
MolecularEvolution.endpoint_conditioned_sample_state_dictMethod
endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.felsenstein!Method
felsenstein!(node::FelNode, models; partition_list = nothing)

Should usually be called on the root of the tree. Propagates Felsenstein pass up from the tips to the root. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.felsenstein_down!Method
felsenstein_down!(node::FelNode, models; partition_list = 1:length(tree.message), temp_message = copy_message(tree.message))

Should usually be called on the root of the tree. Propagates Felsenstein pass down from the root to the tips. felsenstein!() should usually be called first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.forward!Method
forward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)

Propagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.

source
MolecularEvolution.gappy_Q_from_symmetric_rate_matrixMethod
gappy_Q_from_symmetric_rate_matrix(sym_mat, gap_rate, eq_freqs)

Takes a symmetric rate matrix and gap rate (governing mutations to and from gaps) and returns a gappy rate matrix. The equilibrium frequencies are multiplied on column-wise.

source
MolecularEvolution.get_phylo_treeMethod
get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))

Converts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.

source
MolecularEvolution.golden_section_maximizeMethod

Golden section search.

Given a function f with a single local minimum in the interval [a,b], gss returns a subset interval [c,d] that contains the minimum with d-c <= tol.

Examples

julia> f(x) = -(x-2)^2
f (generic function with 1 method)

julia> m = golden_section_maximize(f, 1, 5, identity, 1e-10)
2.0000000000051843

From: https://en.wikipedia.org/wiki/Golden-section_search

source
MolecularEvolution.highlight_seq_drawMethod
highlight_seq_draw(x, y, str::AbstractString, region, basecolor, hicolor; fontsize=8pt, posx=hcenter, posy=vcenter)

Draw a sequence, highlighting the sites given in region. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.

source
MolecularEvolution.highlighter_tree_drawMethod
highlighter_tree_draw(tree, ali_seqs, seqnames, master;
    highlighter_start = 1.1, highlighter_width = 1,
    coord_width = highlighter_start + highlighter_width + 0.1,
    scale_length = nothing, major_breaks = 1000, minor_breaks = 500,
    tree_args = NamedTuple[], legend_padding = 0.5cm, legend_colors = NUC_colors)

Draws a combined tree and highlighter plot. The vector of seqnames must match the node names in tree.

kwargs:

  • treeargs: kwargs to pass to `treedraw()`
  • legendcolors: Mapping of characters to highlighter colors (default NTcolors)
  • scale_length: Length of the scale bar
  • highlighter_start: Canvas start for the highlighter panel
  • highlighter_width: Canvas width for the highlighter panel
  • coord_width: Total width of the canvas
  • major_breaks: Numbered breaks for sequence axis
  • minor_breaks: Ticks for sequence axis
source
MolecularEvolution.lazyprep!Method
lazyprep!(tree::FelNode, initial_message::Vector{<:Partition}; partition_list = 1:length(tree.message), direction::LazyDirection = LazyUp())

Extra, intermediate step of tree preparations between initializing messages across the tree and calling message passing algorithms with LazyPartition.

  1. Perform a lazysort! on tree to obtain the optimal tree for a lazy felsenstein! prop, or a sample_down!.
  2. Fix tree.parent_message to an initial message.
  3. Preallocate sufficiently many inner partitions needed for a felsenstein! prop, or a sample_down!.
  4. Specialized preparations based on the direction of the operations (forward!, backward!). LazyDown or LazyUp.

See also LazyDown, LazyUp.

source
MolecularEvolution.lazysort!Method
  • Should be run on a tree containing LazyPartitions before running felsenstein!. Sorts for a minimal count of active partitions during a felsenstein!
  • Returns the minimum length of memoryblocks (-1) required for a felsenstein! prop. We need a temporary memoryblock during backward!, hence the '-1'.
Note

Since felsenstein! uses a stack, we want to avoid having long node.children[1].children[1]... chains

source
MolecularEvolution.leaf_distmatMethod
leaf_distmat(tree)

Returns a matrix of the distances between the leaf nodes where the index on the columns and rows are sorted by the leaf names.

source
MolecularEvolution.log_likelihood!Method
log_likelihood!(tree::FelNode, models; partition_list = nothing)

First re-computes the upward felsenstein pass, and then computes the log likelihood of this tree. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.log_likelihoodMethod
log_likelihood(tree::FelNode, models; partition_list = nothing)

Computed the log likelihood of this tree. Requires felsenstein!() to have been run. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.longest_pathMethod

Returns the longest path in a tree For convenience, this is returned as two lists of form: [leafnode, parentnode, .... root] Where the leaf_node nodes are selected to be the furthest away

source
MolecularEvolution.marginal_state_dictMethod
marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.matrix_for_displayMethod
matrix_for_display(Q,labels)

Takes a numerical matrix and a vector of labels, and returns a typically mixed type matrix with the numerical values and the labels. This is to easily visualize rate matrices in eg. the REPL.

source
MolecularEvolution.metropolis_sampleMethod
function metropolis_sample(
    initial_tree::FelNode,
    models::Vector{<:BranchModel},
    num_of_samples;
    bl_modifier::UnivariateSampler = BranchlengthSampler(Normal(0,2), Normal(-1,1))
    burn_in=1000, 
    sample_interval=10,
    collect_LLs = false,
    midpoint_rooting=false,
)

Samples tree topologies from a posterior distribution.

Arguments

  • initial_tree: An initial tree topology with the leaves populated with data, for the likelihood calculation.
  • models: A list of branch models.
  • num_of_samples: The number of tree samples drawn from the posterior.
  • bl_sampler: Sampler used to drawn branchlengths from the posterior.
  • burn_in: The number of samples discarded at the start of the Markov Chain.
  • sample_interval: The distance between samples in the underlying Markov Chain (to reduce sample correlation).
  • collect_LLs: Specifies if the function should return the log-likelihoods of the trees.
  • midpoint_rooting: Specifies whether the drawn samples should be midpoint rerooted (Important! Should only be used for time-reversible branch models starting in equilibrium).
Note

The leaves of the initial tree should be populated with data and felsenstein! should be called on the initial tree before calling this function.

Returns

  • samples: The trees drawn from the posterior. Returns shallow tree copies, which needs to be repopulated before running felsenstein! etc.
  • sample_LLs: The associated log-likelihoods of the tree (optional).
source
MolecularEvolution.mixMethod
mix(swm_part::SWMPartition{PType} ) where {PType <: MultiSitePartition}

mix collapses a Site-Wise Mixture partition to a single component partition, weighted by the site-wise likelihoods for each component, and the init weights. Specifically, it takes a SWMPartition{Ptype} and returns a PType. You'll need to have this implemented for certain helper functionality if you're playing with new kinds of SWMPartitions that aren't mixtures of DiscretePartitions.

source
MolecularEvolution.nni_optim!Method
nni_optim!(tree::FelNode, models; <keyword arguments>)

Considers local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.

Keyword Arguments

  • partition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models, the default option).
  • selection_rule = x -> argmax(x): a function that takes the current and proposed log likelihoods and selects a nni configuration. Note that the current log likelihood is stored at x[1].
  • sort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.
  • traversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.
  • shuffle=false: do a randomly shuffled traversal, overrides traversal.
source
MolecularEvolution.partition2obsMethod
partition2obs(part::Partition)

Extracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.

source
MolecularEvolution.plot_multiple_treesMethod
plot_multiple_trees(trees, inf_tree; <keyword arguments>)

Plots multiple phylogenetic trees against a reference tree, inf_tree. For each tree in trees, a linear Weighted Least Squares (WLS) problem (parameterized by the weight_fn keyword) is solved for the x-positions of the matching nodes between inf_tree and tree.

Keyword Arguments

  • node_size=4: the size of the nodes in the plot.
  • line_width=0.5: the width of the branches from trees.
  • font_size=10: the font size for the leaf labels.
  • margin=1.5: the margin between a leaf node and its label.
  • line_alpha=0.05: the transparency level of the branches from trees.
  • y_jitter=0.0: the standard deviation of the noise in the y-coordinate.
  • weight_fn=n::FelNode -> ifelse(isroot(n), 1.0, 0.0)): a function that assigns a weight to a node for the WLS problem.
  • opt_scale=true: whether to include a scaling parameter for the WLS problem.
source
MolecularEvolution.populate_tree!Method
populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1, leaf_name_transform = x -> x)

Takes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if

  • tolerate_missing = 0, an error will be thrown
  • tolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)
  • tolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)

A renaming function that can eg. strip tags from the tree when matching leaf names with names can be passed to leaf_name_transform

source
MolecularEvolution.promote_internalMethod
promote_internal(tree::FelNode)

Creates a new tree similar to the given tree, but with 'dummy' leaf nodes (w/ zero branchlength) representing each internal node (for drawing / evenly spacing labels internal nodes).

source
MolecularEvolution.quadratic_CIMethod
quadratic_CI(f::Function,opt_params::Vector, param_ind::Int; rate_conf_level = 0.99, nudge_amount = 0.01)

Takes a NEGATIVE log likelihood function (compatible with Optim.jl), a vector of maximizing parameters, an a parameter index. Returns the quadratic confidence interval.

source
MolecularEvolution.quadratic_CIMethod
quadratic_CI(xvec,yvec; rate_conf_level = 0.99)

Takes xvec, a vector of parameter values, and yvec, a vector of log likelihood evaluations (note: NOT the negative LLs you) might use with Optim.jl. Returns the confidence intervals computed by a quadratic approximation to the LL.

source
MolecularEvolution.reversibleQMethod
reversibleQ(param_vec,eq_freqs)

Takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.

source
MolecularEvolution.root2tip_distancesMethod
root2tips(root::AbstractTreeNode)

Returns a vector of root-to-tip distances, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.

source
MolecularEvolution.sample_down!Method

sampledown!(root::FelNode,models,partitionlist)

Generates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.savefig_tweakSVGMethod
savefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)

Saves a figure created using the Compose approach, but tweaks the SVG after export.

eg. savefig_tweakSVG("export.svg",pl)

source
MolecularEvolution.savefig_tweakSVGMethod
savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)

Note: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.

eg. savefig_tweakSVG("export.svg",pl, new_viewbox = [-100, -100, 3000, 4500])

source
MolecularEvolution.sim_treeMethod
sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)

Simulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.

Nefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))

source
MolecularEvolution.simple_radial_tree_plotMethod
simple_radial_tree_plot(root::FelNode; canvas_width = 10cm, line_color = "black", line_width = 0.1mm)

Draws a radial tree. No frills. No labels. Canvas height is automatically determined to avoid distorting the tree.

newt = betternewickimport("((A:1,B:1,C:1,D:1,E:1,F:1,G:1):1,(H:1,I:1):1);", FelNode{Float64}); simpleradialtreeplot(newt,linewidth = 0.5mm,root_angle = 7/10)

source
MolecularEvolution.simple_tree_drawMethod

img = simpletreedraw(tree::FelNode; canvaswidth = 15cm, canvasheight = 15cm, linecolor = "black", linewidth = 0.1mm)

A line drawing of a tree with very few options.

img = simple_tree_draw(tree)
img |> SVG("imgout.svg",10cm, 10cm)
OR
using Cairo
img |> PDF("imgout.pdf",10cm, 10cm)
source
MolecularEvolution.total_LLMethod

total_LL(p::Partition)

If called on the root, it returns the log likelihood associated with that partition. Can be overloaded for complex partitions without straightforward site log likelihoods.

source
MolecularEvolution.tree2distancesMethod
tree2distances(root::AbstractTreeNode)

Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.

source
MolecularEvolution.tree2shared_branch_lengthsMethod
tree2distances(root::AbstractTreeNode)

Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.

source
MolecularEvolution.tree_drawMethod
tree_draw(tree::FelNode;
    canvas_width = 15cm, canvas_height = 15cm,
    stretch_for_labels = 2.0, draw_labels = true,
    line_width = 0.1mm, font_size = 4pt,
    min_dot_size = 0.00, max_dot_size = 0.01,
    line_opacity = 1.0,
    dot_opacity = 1.0,
    name_opacity = 1.0,
    horizontal = true,
    dot_size_dict = Dict(), dot_size_default = 0.0,
    dot_color_dict = Dict(), dot_color_default = "black",
    line_color_dict = Dict(), line_color_default = "black",
    label_color_dict = Dict(), label_color_default = "black",
    nodelabel_dict = Dict(),compose_dict = Dict()
    )

Draws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict must be a FelNode->function(x,y) dictionary that returns a compose() struct.

Example using compose_dict

str_tree = "(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0"
newt = gettreefromnewick(str_tree, FelNode)
ladderize!(newt)
compose_dict = Dict()
for n in getleaflist(newt)
    #Replace the rand(4) with the frequencies you actually want.
    compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)
end
tree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)


img = tree_draw(tree)
img |> SVG("imgout.svg",10cm, 10cm)
OR
using Cairo
img |> PDF("imgout.pdf",10cm, 10cm)
source
MolecularEvolution.tree_polish!Method

tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)

Takes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.

source
MolecularEvolution.unc2probvecMethod
unc2probvec(v)

Takes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.

source
MolecularEvolution.univariate_maximizeMethod
univariate_maximize(f, a::Real, b::Real, transform, optimizer::BrentsMethodOpt, t::Real; ε::Real=sqrt(eps))

Maximizes f(x) using Brent's method. See ?brents_method_minimize.

source
MolecularEvolution.univariate_maximizeMethod
univariate_maximize(f, a::Real, b::Real, transform, optimizer::GoldenSectionOpt, tol::Real)

Maximizes f(x) using a Golden Section Search. See ?golden_section_maximize.

Examples

julia> f(x) = -(x-2)^2
f (generic function with 1 method)

julia> m = univariate_maximize(f, 1, 5, identity, GoldenSectionOpt(), 1e-10)
2.0000000000051843
source
MolecularEvolution.univariate_samplerMethod
univariate_sampler(LL, modifier::BranchlengthPeturbation, curr_branchlength)

A MCMC algorithm that draws the next sample of a Markov Chain that approximates the Posterior distrubution over the branchlengths.
source
MolecularEvolution.values_from_phylo_treeMethod
values_from_phylo_tree(phylo_tree, key)

Returns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.
source
MolecularEvolution.weightEMMethod
weightEM(con_lik_matrix::Array{Float64,2}, θ; conc = 0.0, iters = 500)

Takes a conditional likelihood matrix (#categories-by-sites) and a starting frequency vector θ (length(θ) = #categories) and optimizes θ (using Expectation Maximization. Maybe.). If conc > 0 then this gives something like variational bayes behavior for LDA. Maybe.

source
MolecularEvolution.write_fastaMethod
write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)

Writes a fasta file from a vector of sequences, with optional seq_names.

source
MolecularEvolution.write_nexusMethod
write_nexus(fname::String,tree::FelNode)

Writes the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.

source