Hidden Markov Model Functions
NextGenSeqUtils.backward_logs
NextGenSeqUtils.forward_backward_logs
NextGenSeqUtils.forward_logs
NextGenSeqUtils.gen_seq_with_model
NextGenSeqUtils.get_obs_given_state
NextGenSeqUtils.homopolymer_filter
NextGenSeqUtils.initial_dist
NextGenSeqUtils.logsum
NextGenSeqUtils.markov_filter
NextGenSeqUtils.obs_mat
NextGenSeqUtils.trans_mat
NextGenSeqUtils.viterbi_logs
NextGenSeqUtils.viterbiprint
#
NextGenSeqUtils.viterbi_logs
— Function.
viterbi_logs(observations_given_states::Array{Float64, 2}, transitions::Array{Float64, 2}, initials::Array{Float64})
Return viterbi path and log probability for that path. Takes logs of matrices. observations_given_states
has # rows = # states, # columns = # steps of markov process.
#
NextGenSeqUtils.trans_mat
— Function.
trans_mat(; uniform_cycle_prob = 0.9999999999, homopoly_cycle_prob = 0.98)
Return 5x5 transition matrix with given transition probabilities State 1 – uniform observation distribution. State 2 – High "A" observation likelihood. State 3 – High "C" observation likelihood. State 4 – High "G" observation likelihood. State 5 – High "T" observation likelihood. Each state has high likelihood to return to state 1. States 2 through 5 represent high likelihood of homopolymer regions of respective nuc.
#
NextGenSeqUtils.obs_mat
— Function.
obs_mat(; homopoly_prob = 0.99)
Return 5x4 observation matrix with given probabilities. State 1 – uniform observation distribution. State 2 – High "A" observation likelihood. State 3 – High "C" observation likelihood. State 4 – High "G" observation likelihood. State 5 – High "T" observation likelihood. Columns are in same order as columns 2 through 5 (the last four rows give a symmetric matrix).
#
NextGenSeqUtils.initial_dist
— Function.
initial_dist(; uniform_state = 0.99)
Initial state distributions: see trans_mat
for state descriptions. 5x1 vector.
#
NextGenSeqUtils.get_obs_given_state
— Function.
get_obs_given_state(observation_matrix::Array{Float64,2}, observation_seq::String)
Populate 5xT matrix with likelihood of observation at each of T time steps given each nucleotide in observation_seq
.
#
NextGenSeqUtils.homopolymer_filter
— Function.
homopolymer_filter(seqs::Array{String,1}, phreds, names;
transmat = nothing, obsmat = nothing,
initialdist = nothing)
Filter sequences with "bad" sections in the middle – abnormally long runs of a single base, using viterbi alg inference. If this homopolymer occurs on one end of a sequence, keeps sequence and trims homopolymer region off end. phreds
and/or names
may be nothing
, in which case nothing
is returned for the respective field. If transition, observation, initial distribution matrices not provided (default) then the default values from the respective constructors are used.
homopolymer_filter(sourcepath::String, destpath::String;
transmat = nothing, obsmat = nothing,
initialdist = nothing, format="fastq")
Filter sequences with "bad" sections in the middle – abnormally long runs of a single base, using viterbi alg inference. If this homopolymer occurs on one end of a sequence, keeps sequence and trims homopolymer region off end. Takes a file path for each of a source file of type format
(which may be "fasta" or "fastq") and a destination file is written which is the same file type.
#
NextGenSeqUtils.markov_filter
— Function.
homopolymer_filter(seqs::Array{String,1}, phreds, names;
transmat = nothing, obsmat = nothing,
initialdist = nothing)
Filter sequences with "bad" sections in the middle – abnormally long runs of a single base, using viterbi alg inference. If this homopolymer occurs on one end of a sequence, keeps sequence and trims homopolymer region off end. phreds
and/or names
may be nothing
, in which case nothing
is returned for the respective field. If transition, observation, initial distribution matrices not provided (default) then the default values from the respective constructors are used.
homopolymer_filter(sourcepath::String, destpath::String;
transmat = nothing, obsmat = nothing,
initialdist = nothing, format="fastq")
Filter sequences with "bad" sections in the middle – abnormally long runs of a single base, using viterbi alg inference. If this homopolymer occurs on one end of a sequence, keeps sequence and trims homopolymer region off end. Takes a file path for each of a source file of type format
(which may be "fasta" or "fastq") and a destination file is written which is the same file type.
#
NextGenSeqUtils.forward_logs
— Function.
forward_logs(observations_given_states::Array{Float64, 2}, transitions::Array{Float64, 2}, initials::Array{Float64})
Compute logs of forward scores. Takes logs of matrices as inputs. observations_given_states
has # rows = # states, # columns = # steps of markov process.
#
NextGenSeqUtils.backward_logs
— Function.
backward_logs(observations_given_states::Array{Float64, 2}, transitions::Array{Float64, 2}, initials::Array{Float64})
Compute logs of backward scores and individual posterior probabilities. Takes logs of matrices as inputs. observations_given_states
has # rows = # states, # columns = # steps of markov process.
#
NextGenSeqUtils.forward_backward_logs
— Function.
forward_backward_logs(observations_given_states::Array{Float64, 2}, transitions::Array{Float64, 2}, initials::Array{Float64})
Compute logs of forward-backward scores and individual probabilities. Takes logs of matrices as inputs. observations_given_states
has # rows = # states, # columns = # steps of markov process.
#
NextGenSeqUtils.logsum
— Function.
logsum(lga, lgb)
Compute numerically stable logsum. Returns -Inf
if either input is -Inf
.
#
NextGenSeqUtils.gen_seq_with_model
— Function.
gen_seq_with_model(n::Int, trans_mat, obs_mat, initial_dists)
Generate a sequence given a markov model. May create bad reads (long runs of a single base).
#
NextGenSeqUtils.viterbiprint
— Function.
viterbiprint(s::String)
Draw flagged sites with capital letters, safe sites with lowercase.