CodonMolecularEvolution
A Julia package for popular and new codon models of molecular evolution.
Documentation for CodonMolecularEvolution.
Descendant of MolecularEvolution.jl, specializing in codon models.
Collection of codon model methods
- difFUBAR: Scalable Bayesian comparison of selection pressure
- Perform a site-wise comparison of evolutionary pressure between two selected sets of branches.
- Authors: Hassan Sadiq, Venkatesh Kumar, and Ben Murrell (original model development), Patrick Truong (benchmarking), Maximilian Danielsson (performance optimization).
Design principles
- User-facing
- Users with no Julia experience should be able to run these models.
- Scalability
- Models should scale to large, real-world datasets. To keep the memory footprint down, we use
MolecularEvolution.jl
sLazyPartition
when possible.
- Models should scale to large, real-world datasets. To keep the memory footprint down, we use
- Performance
- We try to maintain competitive runtimes by using e.g. computational shortcuts and parallelization whenever we can.
Package Authors and Maintainers
Maximilian Danielsson and Ben Murrell. Authors for specific models listed above.
CodonMolecularEvolution.PiecewiseOUModel
CodonMolecularEvolution.ShiftingHBSimModel
CodonMolecularEvolution.ShiftingHBSimPartition
CodonMolecularEvolution.ShiftingNeHBSimModel
CodonMolecularEvolution.ShiftingNeHBSimPartition
CodonMolecularEvolution.difFUBARBaseline
CodonMolecularEvolution.difFUBARParallel
CodonMolecularEvolution.difFUBARTreesurgery
CodonMolecularEvolution.difFUBARTreesurgeryAndParallel
CodonMolecularEvolution.FUBAR_analysis
CodonMolecularEvolution.FUBAR_analysis
CodonMolecularEvolution.FUBAR_analysis
CodonMolecularEvolution.HB98AA_matrix
CodonMolecularEvolution.HB98_row
CodonMolecularEvolution.HB_fixation_rate
CodonMolecularEvolution.HBdNdS
CodonMolecularEvolution.HBviz
CodonMolecularEvolution.benchmark_global_fit
CodonMolecularEvolution.benchmark_grid
CodonMolecularEvolution.dNdS
CodonMolecularEvolution.difFUBAR
CodonMolecularEvolution.difFUBAR_init
CodonMolecularEvolution.difFUBAR_tabulate_and_plot
CodonMolecularEvolution.generate_ambient_indices
CodonMolecularEvolution.generate_fubar_indices
CodonMolecularEvolution.getpuresubclades
CodonMolecularEvolution.import_colored_figtree_nexus_as_tagged_tree
CodonMolecularEvolution.import_colored_figtree_nexus_as_tagged_tree
CodonMolecularEvolution.import_grouped_label_tree
CodonMolecularEvolution.import_grouped_label_tree
CodonMolecularEvolution.import_labeled_phylotree_newick
CodonMolecularEvolution.import_labeled_phylotree_newick
CodonMolecularEvolution.jump
CodonMolecularEvolution.jumpy_HB_codon_evolve
CodonMolecularEvolution.jumpy_NeHB_codon_evolve
CodonMolecularEvolution.maxdNdS2std
CodonMolecularEvolution.optimize_MG94_F3x4
CodonMolecularEvolution.optimize_MG94_F3x4
CodonMolecularEvolution.shiftingHBviz
CodonMolecularEvolution.shiftingNeHBviz
CodonMolecularEvolution.sim_alphabeta_seqs
CodonMolecularEvolution.std2maxdNdS
CodonMolecularEvolution.time_varying_HB_freqs
MolecularEvolution.forward!
CodonMolecularEvolution.PiecewiseOUModel
— TypePiecewiseOUModel(event_rate::Float64, eq_std::Float64, mixing::Float64; delta_t = 1.0)
PiecewiseOUModel(offsets::Vector{Float64})
PiecewiseOUModel(event_rate::Float64, eq_std::Float64, mixing::Float64, mu::Union{Float64,Vector{Float64}}, offsets::Union{Float64,Vector{Float64}}, codon_offsets::Union{Float64,Vector{Float64}})
A piecewise constant approximation to an OU process, intended to simulate fitnesses evolving over phylogenies. The equilibrium standard deviation is directly parameterized (eq_std
), as is the rate at which the process mixes to equilibrium (mixing
). event_rate
controls how often the fitness changes occur, where the mixing rate is scaled to compensate for the increased rate of change to achieve approximately the same amount of change per unit time even as the event_rate
changes. A very high event_rate
will behave more like continuous diffusion, but will be more computationally expensive to sample from. mu
can also be set to control the mean fitnesses. The model also permits offsets
, which are added to the fitnesses as they are passed into the model. For a single process, these are confounded with the mean mu
but if the offsets change (eg. from one branch to another) the effective fitnesses will immidiately change, whereas if mu
changes the fitnesses will drift towards mu
.
CodonMolecularEvolution.ShiftingHBSimModel
— TypeShiftingHBSimModel(sites, alphas, ou_params, nuc_matrix; rescale = true)
A model for simulating fitnesses evolving over phylogenies using the HB98 model. sites
is the number of sites, alphas
is a vector of synonymous rates (one per site), ou_params
is a vector of PiecewiseOUModel
s (one per site), and nuc_matrix
is the symmetric nucleotide substitution matrix (shared across sites). If 'rescale' is true, then the nuc matrix is scaled so that, when alpha=1
and the fitnesses=0
, the HB model expects one substitution per site per unit time.
CodonMolecularEvolution.ShiftingHBSimPartition
— TypeShiftingHBSimPartition(model::ShiftingHBSimModel; burnin_time = 100.0, code = MolecularEvolution.universal_code)
Constructs a partition that tracks evolving fitnesses and codons. Only useable for sampling (not likelihood calculations).
CodonMolecularEvolution.ShiftingNeHBSimModel
— MethodShiftingNeHBSimModel(nuc_matrix, unscaled_ou_params, logNe_model; alpha, rescale, code)
Create a model with one OU process per site (unscaledouparams) and one OU process for log(Ne). alpha
can be a scalar or a vector of the same length as unscaledouparams. If rescale
is true, the nucmatrix is scaled (HB98 neutral scaling).
CodonMolecularEvolution.ShiftingNeHBSimPartition
— MethodShiftingNeHBSimPartition(model; burnin_time)
Constructs a partition for the ShiftingNeHBSimModel
. Initializes by:
- Drawing random unscaled fitnesses from each site's OU equilibrium.
- Drawing random logNe from its OU equilibrium.
- "Burning in" each site along the branch of length
burnin_time
.
CodonMolecularEvolution.difFUBARBaseline
— TypeConstructor
difFUBARBaseline()
Description
Use the trivial implementation of the grid likelihood computations, i.e. 1 thread without sub-tree likelihood caching.
See also: difFUBARParallel
, difFUBARTreesurgery
, difFUBARTreesurgeryAndParallel
.
CodonMolecularEvolution.difFUBARParallel
— TypeConstructor
difFUBARParallel()
Description
Extend the baseline version by parallelizing the grid calculations. Requires julia to be launched with the t
switch. Using t
computational threads, where t
is sufficiently small, memory complexity is usually O(t) and time complexity O(1/t). Empirical tests suggests that t
should not be higher than the machine's total CPU threads and usually not higher than half of it's total threads.
See also: difFUBARBaseline
, difFUBARTreesurgery
, difFUBARTreesurgeryAndParallel
.
CodonMolecularEvolution.difFUBARTreesurgery
— TypeConstructor
difFUBARTreesurgery()
Description
Use sub-tree likelihood caching described in the "Methods" section of the difFUBAR paper. Use more memory than the baseline version but be significantly faster, if purity is high.
See also: difFUBARBaseline
, difFUBARParallel
, difFUBARTreesurgeryAndParallel
.
CodonMolecularEvolution.difFUBARTreesurgeryAndParallel
— TypeConstructor
difFUBARTreesurgeryAndParallel()
Description
Use parallelization and sub-tree likelihood caching. The most performant version in most cases. Use more memory than other versions.
See also: difFUBARBaseline
, difFUBARTreesurgery
, difFUBARParallel
.
CodonMolecularEvolution.FUBAR_analysis
— MethodFUBAR_analysis(method::DirichletFUBAR, grid::FUBARGrid{T};
analysis_name = "dirichlet_fubar_analysis",
exports = true,
posterior_threshold = 0.95,
verbosity = 1,
code = MolecularEvolution.universal_code,
optimize_branch_lengths = false,
concentration = 0.5,
iterations = 2500,
volume_scaling = 1.0) where {T}
Perform a Fast Unconstrained Bayesian AppRoximation (FUBAR) analysis using a Dirichlet process.
Arguments
method::DirichletFUBAR
: Empty struct to dispatch the original FUBAR methodgrid::FUBARGrid{T}
: the FUBARGrid to perform inference on
Keywords
analysis_name::String="dirichlet_fubar_analysis"
: File namesexports::Bool=true
: Whether to export results to files. Will plot if MolecularEvolutionViz is presentposterior_threshold::Float64=0.95
: Posterior probability classification threshold forverbosity::Int=1
: Control level of output messages (0=none, higher values=more details)code=MolecularEvolution.universal_code
: Molecular code to useoptimize_branch_lengths::Bool=false
: ?concentration::Float64=0.5
: Concentration parameter for the Dirichlet processiterations::Int=2500
: Number of EM algorithm iterationsvolume_scaling::Float64=1.0
: Controls the scaling of the marginal parameter violin plots
Returns
- A tuple containing:
df_results
: DataFrame with FUBAR analysis resultsparams
: A named tuple with the fields -θ
: Parameter estimates from the EM algorithm
Description
Takes in a FUBARGrid object and outputs results for sites obtained from the FUBAR method
CodonMolecularEvolution.FUBAR_analysis
— MethodFUBAR_analysis(method::FIFEFUBAR, grid::FUBARGrid{T};
analysis_name = "fife_analysis",
verbosity = 1,
exports = true,
positive_tail_only = false) where {T}
Perform a FUBAR type analysis using the FIFE (Frequentist Inference For Evolution) approach.
Arguments
method::FIFEFUBAR
: Empty struct to dispatch ongrid::FUBARGrid{T}
: Grid containing data to perform inference on
Keywords
analysis_name::String="fife_analysis"
: Name for the analysis output files and directoryverbosity::Int=1
: Control level of output messages (0=none, higher values=more details)exports::Bool=true
: Whether to export results to filespositive_tail_only::Bool=false
: If true, uses a one-tailed test for positive selection only
Returns
df_results::DataFrame
: A DataFrame containing the frequentist analysis results
Description
Frequentist method that gives p-values for site-wise alpha/beta tests
This function performs likelihood ratio tests at each site using interpolated likelihood surfaces. When positive_tail_only=true
, the p-values are adjusted to reflect a one-tailed test that only considers positive selection (β > α) by using a dirac delta/Chi-square mixture
CodonMolecularEvolution.FUBAR_analysis
— MethodFUBAR_analysis(method::SKBDIFUBAR, grid::FUBARGrid{T};
analysis_name = "skbdi_fubar_analysis",
volume_scaling = 1.0,
exports = true,
verbosity = 1,
posterior_threshold = 0.95,
distance_function = standard_fubar_distance_function,
kernel_function = (d, c) -> exp(-d / c^2),
kernel_parameter_dimension = 1,
supression_type = nothing,
m = 15,
ϵ = 1e-6,
n_samples = 1000,
burnin = 200,
thinning = 50) where {T}
Perform a Fast Unconstrained Bayesian AppRoximation (FUBAR) analysis using the SKBDI (Smooth Kernel Bayesian Density Inference) approach.
Arguments
method::SKBDIFUBAR
: Empty struct used for dispatchgrid::FUBARGrid{T}
: Grid to perform inference on
Keywords
analysis_name::String="skbdi_fubar_analysis"
: Name for the analysis output files and directoryvolume_scaling::Float64=1.0
: Controls the scaling of the marginal parameter violin plotsexports::Bool=true
: Whether to export results to filesverbosity::Int=1
: Control level of output messages (0=none, higher values=more details)posterior_threshold::Float64=0.95
: Posterior probability threshold for classificationdistance_function=standard_fubar_distance_function
: Function used to calculate distances between grid pointskernel_function=(d, c) -> exp(-d / c^2)
: Kernel function used for the covariance matrix.kernel_parameter_dimension::Int=1
: How many kernel parameters are taken in by the kernel bandwidth function.supression_type=nothing
: Supression type object; if nothing, a default supression type is constructedm::Int=10
: Krylov subspace dimension, 15 seems to work well for standard grids. Larger m gives slower sampling but better numerical precision.ϵ::Float64=1e-6
: Tykhonoff regularisationn_samples::Int=1000
: Number of MCMC samples to generateburnin::Int=200
: Number of initial samples to discard as burninthinning::Int=50
: Interval for thinning samples to reduce autocorrelation
Returns
- A tuple containing:
analysis
: DataFrame with FUBAR analysis resultsparams
: A named tuple with the fields -θ
: Thinned transformed grid samples from the chain,kernel_samples
: Samples from the kernel
Description
This function implements a Smooth Kernel Bayesian Density Inference approach to FUBAR analysis. It defines a Gaussian model based on the grid, samples from this model using MCMC, and processes the samples to generate posterior probabilities of selection.
If no supression type is provided, a default one is constructed based on the grid dimensions with a fifth degree polynomial is used
CodonMolecularEvolution.HB98AA_matrix
— MethodHB98AA_matrix(alpha, nuc_matrix, AA_fitness; genetic_code = MolecularEvolution.universal_code)
Returns the rate matrix for a codon model using the HB98 model where each AA has a different fitness. alpha
is the synonymous rate, nuc_matrix
is the symmetric nucleotide substitution matrix, and AA_fitness
is the fitness of each amino acid.
CodonMolecularEvolution.HB98_row
— MethodHB98AA_row(current_codon, alpha, nuc_matrix, AA_fitness; genetic_code=MolecularEvolution.universal_code)
Returns the rate row for a codon model using the HB98 model where each AA has a different fitness. current_codon
is the current codon, alpha
is the synonymous rate, nuc_matrix
is the symmetric nucleotide substitution matrix, and AA_fitness
is the fitness of each amino acid.
CodonMolecularEvolution.HB_fixation_rate
— MethodHB_fixation_rate(from_codon, to_codon)
Returns the fixation rate of a mutation from `from_codon` to `to_codon` under the HB98 model.
CodonMolecularEvolution.HBdNdS
— MethodHBdNdS(fs::Vector{Float64}; code = MolecularEvolution.universal_code, nucm = CodonMolecularEvolution.demo_nucmat)
HBdNdS(fs_pre::Vector{Float64}, fs_post::Vector{Float64}; code = MolecularEvolution.universal_code, nucm = CodonMolecularEvolution.demo_nucmat)
Returns the expected dN/dS ratio for a Halpern and Bruno model with a vector of fitnesses. If two vectors are provided, then the dN/dS ratio is computed for the shift from fs_pre
to fs_post
.
CodonMolecularEvolution.HBviz
— MethodHBviz(ts, fst, T, alp, nucm)
Visualize over time the fitness trajectory, the codon frequencies, and the expected dN/dS. ts
is a vector of times, fst
is a vector of fitnesses, T
is the total time, alp
is the alpha parameter, and nucm
is the nucleotide substitution matrix.
σ = 2.0
alpha = 1.0
nucm = CodonMolecularEvolution.demo_nucmat
fst = [randn(20) .* σ, randn(20) .* σ]
ts = [-100.0, 1.0]
T = 2.0
HBviz(ts, fst, T, alpha, nucm)
CodonMolecularEvolution.benchmark_global_fit
— MethodCodonMolecularEvolution.benchmark_global_fit(benchmark_name; exports=true, data=1:5, optimize_branch_lengths=false)
Benchmarks different implementations of the difFUBARglobalfit algorithm. Results of the benchmark are printed out as a DataFrame and saved to a CSV file. Uses the heuristic top pick to generate con lik matrices. Compares difference in con lik matrices. If exports is true, it also runs MCMCs on the con lik matrices and plots the means and posteriors of the different versions.
benchmark_name
is the filepath to where the benchmark will be saved, if exports.data
is the range/vector of datasets to run the benchmark on. By default, this is 1:5. These are the enumerated datasets:- Ace2nobackground
- Ace2reallytiny
- Ace2tiny
- ParvoVP
- ParvoVPregrouped
optimize_branch_lengths
is an option that can be eithertrue
,false
or"detect"
CodonMolecularEvolution.benchmark_grid
— MethodCodonMolecularEvolution.benchmark_grid(benchmark_name; exports=true, versions_option=1, t::Integer=0, data=1:5)
Benchmarks different implementations of the difFUBAR_grid algorithm. Results of the benchmark are printed out as a DataFrame and saved to a CSV file.
benchmark_name
is the filepath to where the benchmark will be saved, if exportsversions_option
have 4 different options:- default option, only run heuristic top pick
- only run baseline version
- run heuristic top pick and baseline version
- run all versions
t
is the number of threads you want to use in the parallel versions. If specified and non-zero, this will override the number of threads chosen by the heuristic.data
is the range/vector of datasets to run the benchmark on. By default, this is 1:5. These are the enumerated datasets:- Ace2nobackground
- Ace2reallytiny
- Ace2tiny
- ParvoVP
- ParvoVPregrouped
CodonMolecularEvolution.dNdS
— MethoddNdS(q1, q0, p; code = MolecularEvolution.universal_code)
dNdS(q1, q0; code = MolecularEvolution.universal_code)
Returns an analytic expectation of the dN/dS ratio for a codon model. q1
is the rate matrix where selection is active (eg. a Halpern and Bruno model with a set of fitnesses), and q0
is the corresponding rate matrix where selection is inactive (eg. a Halpern and Bruno model with all fitnesses equal). p
is the frequency distribution over codons that the dN/dS ratio is computed against. If not provided, this is computed as the equilibrium from q1
. If only a vector of fitnesses are provided, then the q1
, q0
, and p
are computed assuming a Halpern and Bruno model.
fs = randn(20)
nucm = CodonMolecularEvolution.demo_nucmat
q1 = HB98AA_matrix(1.0, nucm, fs)
q0 = HB98AA_matrix(1.0, nucm, zeros(20))
dNdS(q1, q0)
CodonMolecularEvolution.difFUBAR
— MethoddifFUBAR(seqnames, seqs, treestring, tags, outpath; <keyword arguments>)
Takes a tagged phylogeny and an alignment as input and performs difFUBAR analysis. Returns df, results_tuple, plots_named_tuple
where df
is a DataFrame of the detected sites, results_tuple
is a tuple of the partial calculations needed to re-run difFUBAR_tabulate_and_plot
, and plots_named_tuple
is a named tuple of plots. Consistent with the docs of difFUBAR_tabulate_and_plot
, results_tuple
stores (alloc_grid, codon_param_vec, alphagrid, omegagrid, tag_colors)
.
Arguments
seqnames
: vector of untagged sequence names.seqs
: vector of aligned sequences, corresponding toseqnames
.treestring
: a tagged newick tree string.tags
: vector of tag signatures.outpath
: export directory.tag_colors=DIFFUBAR_TAG_COLORS[sortperm(tags)]
: vector of tag colors (hex format). The default option is consistent with the difFUBAR paper (Foreground 1: red, Foreground 2: blue).pos_thresh=0.95
: threshold of significance for the posteriors.iters=2500
: iterations used in the Gibbs sampler.burnin=div(iters, 5)
: burnin used in the Gibbs sampler.concentration=0.1
: concentration parameter used for the Dirichlet prior.binarize=false
: if true, the tree is binarized before the analysis.verbosity=1
: as verbosity increases, prints are added accumulatively.- 0 - no prints
- 1 - show current step and where output files are exported
- 2 - show the chosen
difFUBAR_grid
version and amount of parallel threads.
exports=true
: if true, output files are exported.exports2json=false
: if true, the results are exported to a JSON file (HyPhy format).code=MolecularEvolution.universal_code
: genetic code used for the analysis.optimize_branch_lengths=false
: if true, the branch lengths of the phylogenetic tree are optimized.version::Union{difFUBARGrid, Nothing}=nothing
: explicitly choose the version ofdifFUBAR_grid
to use. Ifnothing
, the version is heuristically chosen based on the available RAM and Julia threads.t=0
: explicitly choose the amount of Julia threads to use. If0
, the degree of parallelization is heuristically chosen based on the available RAM and Julia threads.
Julia starts up with a single thread of execution, by default. See Starting Julia with multiple threads.
CodonMolecularEvolution.difFUBAR_init
— MethodCodonMolecularEvolution.difFUBAR_tabulate_and_plot
— MethoddifFUBAR_tabulate_and_plot(analysis_name, pos_thresh, alloc_grid, codon_param_vec, alphagrid, omegagrid, tag_colors, verbosity=1, exports=true)
Takes the output of difFUBAR
, tabulates and plots the results. Returns a DataFrame of tabulated results, a tuple of partial calculations needed to re-run tabulate, and a vector of plots. This function enables you to use the results of difFUBAR
to tabulate the results with a different threshold.
Arguments
analysis_name
: where to export the results.pos_thresh
: threshold of significance for the posteriors.alloc_grid
: contains the result of the Gibbs sampler.codon_param_vec
: vector of codon parameters from difFUBAR.alphagrid
: grid of alpha values.omegagrid
: grid of omega values.tag_colors
: colors of the tags.verbosity=1
: will print to stdout if 1, will not print to stdout if 0.exports=true
: if true, output files are exported.
If:
df, results, plots = difFUBAR(seqnames, seqs, treestring, tags, "my_analysis")
Then you can retabulate and replot by propagating the results
tuple:
difFUBAR_tabulate_and_plot("my_analysis_0.85", 0.85, results...)
CodonMolecularEvolution.generate_ambient_indices
— Methodgenerate_ambient_indices(N::Int)
Generate ambient format indices for an NxN grid. Indices are generated along diagonals from bottom-right to top-left, with each diagonal numbered from bottom to top.
CodonMolecularEvolution.generate_fubar_indices
— Methodgenerate_fubar_indices(N::Int)
Generate FUBAR format indices for an NxN grid. Indices are generated column-wise from bottom to top, starting from the leftmost column.
CodonMolecularEvolution.getpuresubclades
— Methodgetpuresubclades(tree::FelNode, tags::Vector{String})
- Should usually be called on the root of the tree. Traverses the tree iteratively with a depth-first search to find roots of pure subclades, presuming that nodenames have been trailed with tags. Returns a Vector{FelNode} with root-nodes of the pure subclades.
CodonMolecularEvolution.import_colored_figtree_nexus_as_tagged_tree
— Methodimport_colored_figtree_nexus_as_tagged_tree(fname; custom_labels=String[])
Takes a nexus file from FigTree, where branches have been colored. Replaces all color tags with group tags that can be used in the models. Can add custom labels too. Should consider an entire custom dictionary as well in future.
Examples
julia> treestring, tags, tag_colors = import_colored_figtree_nexus_as_tagged_tree("data/Ace2_no_background.nex")
("(((XM_027533928_Bos_indicus_x_Bos_taurus{G1}:0.097072,(XM_042974087_Panthera_tigris{G1}:0.038016,... more ...;", ["{G2}", "{G1}"], ["#ff0015", "#0011ff"])
treestring
is truncated. NEXUS tree file
CodonMolecularEvolution.import_grouped_label_tree
— Methodimport_grouped_label_tree(tree_file)
Takes a Newick tree file and return Newick tree, Newick tree with replaced tags, group tags, original tags, and randomly generated colours for each tag
CodonMolecularEvolution.import_labeled_phylotree_newick
— Methodimport_labeled_phylotree_newick(fname)
Import a tagged phylogeny from phylotree and return the treestring and tags.
CodonMolecularEvolution.jump
— Methodjump(x_source, m::PiecewiseOUModel)
Evolves values over time using a piecewise constant approximation to an OU process, where this function computes the new distribution for a single discrete jump. x_source
is the vector of fitnesses, and m is the PiecewiseOUModel.
CodonMolecularEvolution.jumpy_HB_codon_evolve
— Methodjumpy_HB_codon_evolve(fitnesses, codon, ou_model, nuc_matrix, alpha, time;
genetic_code = MolecularEvolution.universal_code, push_into = nothing)
Evolves fitnesses and codons over time using the HB98 model. fitnesses
is the vector of fitnesses, codon
is the current codon, ou_model
is the OU model, nuc_matrix
is the symmetric nucleotide substitution matrix, alpha
is the synonymous rate, and time
is the total time to evolve over.
CodonMolecularEvolution.jumpy_NeHB_codon_evolve
— Methodjumpy_NeHB_codon_evolve(fitnesses, logNe_trajectory, codon, fitness_model, nuc_matrix, alpha;
genetic_code = MolecularEvolution.universal_code, push_into = nothing)
Evolves codons and unscaled site-fitness, along with a given trajectory of log-pop-size.
CodonMolecularEvolution.maxdNdS2std
— MethodmaxdNdS2std(ω)
Inverse of std2maxdNdS(σ). Estimates the standard deviation of the fitnesses that will produce, in expectation, a dN/dS ratio of ω
, assuming Gaussian fitnesses and a Halpern and Bruno model, where the fitnesses have just shifted from one Gaussian sample to another. Note: this is not an analytical solution, but a serindipitously good approximation.
CodonMolecularEvolution.optimize_MG94_F3x4
— Methodoptimize_MG94_F3x4(seqnames, seqs, tree; leaf_name_transform=x -> x, genetic_code=MolecularEvolution.universal_code)
Optimizes the MG94+F3x4 model on a tree, given a set of sequences and a tree. Returns the optimized tree, LL, alpha, beta, nucmatrix, F3x4, and eqfreqs. The leafnametransform kwarg can be used to transform the leaf names in the tree to match the seqnames.
CodonMolecularEvolution.shiftingHBviz
— MethodshiftingHBviz(T, event_rate, σ, mixing_rate, alpha, nucm; T0 = -20)
Visualize the fitness trajectory, codon frequencies, and expected dN/dS over time for a shifting HB process. T
is the total time, event_rate
is the rate of fitness shifts, σ
is the standard deviation of the fitnesses, mixing_rate
is the rate of mixing between fitnesses, alpha
is the alpha parameter, and nucm
is the nucleotide substitution matrix. T0
controls the burnin time, to ensure the process is at equilibrium at t=0
.
T = 2.0
mix = 1.0
σ = 5.0
event_rate = 100.0
alpha = 1.0
nucm = CodonMolecularEvolution.demo_nucmat
shiftingHBviz(T, event_rate, σ, mix, alpha, nucm)
CodonMolecularEvolution.shiftingNeHBviz
— MethodshiftingNeHBviz(T, f_event_rate, f_σ, f_mixing_rate, logNe_event_rate, logNe_σ, logNe_mean, logNe_mixing_rate, alpha, nucm; T0 = -20)
Visualize the Ne trajectory, fitness trajectory, codon frequencies, and expected dN/dS over time for a shifting Ne HB process.
CodonMolecularEvolution.sim_alphabeta_seqs
— Methodsim_alphabeta_seqs(alphavec::Vector{Float64}, betavec::Vector{Float64}, singletree, nucmat::Array{Float64,2}, f3x4::Array{Float64,2};
scale_total_tree_neutral_expected_subs = -1.0, outpath = "")
Simulate a set of sequences under a given tree, with a set of alpha and beta values. f3x4 is a 3-by-4 matrix of position-specific nucleotide frequencies. nucmat is a 4-by-4 matrix of nucleotide substitution rates. If scaletotaltreeneutralexpectedsubs > 0, then the tree is scaled so that if alpha=beta=1 for all sites, the expected number of neutral substitutions is equal to scaletotaltreeneutralexpectedsubs. The sequences are written to a fasta file, and the tree is written to a newick file.
CodonMolecularEvolution.std2maxdNdS
— Methodstd2maxdNdS(σ)
Approximation for the maximum dN/dS ratio as a function of the standard deviation of the fitnesses, assuming Gaussian fitnesses and a Halpern and Bruno model, where the fitnesses have just shifted from one Gaussian sample to another. Note: this is not an analytical solution, but a serindipitously good approximation.
function monte_carlo_maxdNdS(σ; N=100_000)
sum_val = 0.0
for _ in 1:N
f_i = σ * randn()
f_j = σ * randn()
sum_val += HB_fixation_rate(f_i, f_j)
end
return sum_val / N
end
vs = 0:0.01:10
plot(vs, monte_carlo_maxdNdS.(vs), label = "Monte Carlo", alpha = 0.8)
plot!(vs, std2maxdNdS.(vs), label = "Approx", linestyle = :dash, alpha = 0.8)
CodonMolecularEvolution.time_varying_HB_freqs
— Methodtime_varying_HB_freqs(ts, T, fst, init_freqs; nucm = CodonMolecularEvolution.demo_nucmat, alpha = 1.0, delta_t = 0.002, prezero_delta_t = 0.5)
Compute the time-varying codon frequencies and expected dN/dS over time for a sequence of fitnesses, under the Halpern-Bruno model. ts
is a vector of times, T
is the total time, fst
is a vector of vector of fitnesses, init_freqs
is the initial codon frequencies, nucm
is the nucleotide substitution matrix, alpha
is the alpha parameter, delta_t
is the discretization time step for the simulation, and prezero_delta_t
is the time step used before t=0
. fst[i]
is assumed to be the fitness between t = ts[i]
and t = ts[i+1]
.
MolecularEvolution.forward!
— Methodforward!(dest, source, model, node)
Evolves source
partition along node.branchlength
under model
, storing the result in dest
.