Skip to content

Consensus and Fine Cluster Splitting Functions

# RAD.alignment_consensusFunction.

alignment_consensus(seqs)

Computes a consensus sequence for array of given sequences that have already been aligned. Consensus is computed by site-wise mode among given sequences.

source

# RAD.get_centroidFunction.

get_centroid(reads, k, distfunc = corrected_kmer_dist)

Computes a rough cluster centroid by returning the sequence in reads that has the nearest distance in k-mer space to the mean k-mer vector.

source

# RAD.consensus_seqFunction.

consensus_seq(reads; thresh = 0.7, shift = 1, k = 6,
                   distfunc = corrected_kmer_dist)

Computes a cluster centroid by taking the nearest read to the mean k-mer vector (see get_centroid(reads, k, distfunc)) and refining it (see refine_ref(ref, reads)). shift determines the window size of comparison between sequences when refining rough centroid locally (actual window size is shift+1).

source

# RAD.refine_refFunction.

refine_ref(candidate_ref, reads;  thresh = 0.7, shift = 1)

Takes a candidate consensus sequence, candidate_ref, and corresponding array of reads and refines the candidate via majority votes from reads at local windows of size shift+1. If after alignment the frequency of a local region of the candidate is less than thresh, this part of the candidate is refined.

source

# RAD.consensus_vizFunction.

consensus_viz(candidate_ref, reads; thresh = 0.7, shift = 3, 
              intitle = "Consensus agreement.")

Creates a plot to visualize the agreement of a consensus sequence, candidate_ref, with its cluster of reads. Size of local window/site for comparison = shift+1. I'm pretty sure thresh does nothing here.

source

# RAD.disagreementsFunction.

disagreements(candidate_ref, reads; thresh = 0.7, shift = 3)

Prints local disagreements between candidate_ref and each sequence of reads after aligning. Size of local window/site for comparison = shift+1. A disagreement is a region where the candidate has a local region with frequency less than thresh.

source

# RAD.diff_in_homopolymer_regionFunction.

diff_in_homopolymer_region(alignment::Array{String, 1}; polylen=3)

Returns true if two aligned sequences differ only by single gaps in homopolyer regions (ie one gap per region). alignment is an array of two strings that have already been aligned. A homopolymer region is determined to be a region of a single repeated nucleotide of length at least polylen.

source

# RAD.get_coarse_centroidFunction.

get_coarse_centroid(seqs::Array{String, 1}; subsample = 1000, k = 4)

Returns a 'master consensus' representing largest coarse cluster of sequences, computed among a subsample of reads. The consensus is the closest sequence of reads to the mean k-mer vector in the largest computed cluster.

source