Utility and Misc. Functions
BioSequences.reverse_complement
NextGenSeqUtils.concat_fastas
NextGenSeqUtils.dash_count
NextGenSeqUtils.degap
NextGenSeqUtils.filter_by_length
NextGenSeqUtils.freq
NextGenSeqUtils.freq_dict_print
NextGenSeqUtils.generate_aa_seqs
NextGenSeqUtils.length_filter
NextGenSeqUtils.maxfreq
NextGenSeqUtils.print_diffs
NextGenSeqUtils.print_fasta
NextGenSeqUtils.print_rgb
NextGenSeqUtils.seq_details
NextGenSeqUtils.single_gap
NextGenSeqUtils.single_mod_three_gap
NextGenSeqUtils.sorted_freqs
NextGenSeqUtils.translate_to_aa
NextGenSeqUtils.trim_ends_indices
#
NextGenSeqUtils.print_fasta
— Function.
print_fasta(seqs, names)
Prints fasta format to the terminal, for copypasting into alignment/blast etc.
#
NextGenSeqUtils.degap
— Function.
degap(s::String)
Returns given string without '-' gap symbols.
degap(s::DNASequence)
Returns given string without '-' gap symbols.
#
NextGenSeqUtils.dash_count
— Function.
dash_count(inStr::String)
Counts number of gap symbols '-' in given string.
#
NextGenSeqUtils.single_gap
— Function.
single_gap(str::String, ind::Int)
True if str
has a single gap '-' at index ind
, else false.
#
NextGenSeqUtils.single_mod_three_gap
— Function.
single_mod_three_gap(str::String, ind::Int)
True if str
has a gap length of 1 mod 3 at given index.
#
NextGenSeqUtils.seq_details
— Function.
seq_details(fasta_path)
Gives names, sequences, error rates, and lengths from given filepath, which may end in '.fasta' or '.fastq'.
#
NextGenSeqUtils.print_rgb
— Function.
print_rgb(r, g, b, t)
Prints in colors r
,g
,b
to terminal.
#
BioSequences.reverse_complement
— Function.
reverse_complement(seq)
Make a reversed complement sequence of seq
.
Ambiguous nucleotides are left as-is.
reverse_complement(kmer::Kmer)
Return the reverse complement of kmer
reverse_complement(dna_string::String)
Returns the complement of the reverse of given nucleotide sequence.
#
NextGenSeqUtils.print_diffs
— Function.
print_diffs(s1, s2; width=5, prefix="")
Prints two already aligned sequences with differences in color to terminal.
#
NextGenSeqUtils.trim_ends_indices
— Function.
trim_ends_indices(seq, ref; edge_reduction=0.1)
Align seq
to ref
with default low penalties for gaps on ends, and trim insertions on the ends of seq
. Returns (start, stop) indices.
#
NextGenSeqUtils.translate_to_aa
— Function.
translate_to_aa(s::String)
Return amino acid string translation of nucleotide sequence using BioSequences conversion.
#
NextGenSeqUtils.generate_aa_seqs
— Function.
generate_aa_seqs(str::String)
Return sequence translated to amino acids in each reading frame (returns three amino acid sequences).
#
NextGenSeqUtils.filter_by_length
— Function.
filter_by_length(args...)
Deprecated. See length_filter
.
#
NextGenSeqUtils.length_filter
— Function.
length_filter(seqs::Array{String, 1}, phreds::Union{Array{Vector{Phred},1},Void}, names::Union{Array{String,1},Void},
minlength::Int, maxlength::Int)
Filter sequences and corresponding names and phreds (which may be nothing
) by length.
length_filter(seqs::Array{String, 1}, minlength::Int64, maxlength::Int64)
Filter sequences by length.
#
NextGenSeqUtils.concat_fastas
— Function.
concat_fastas(filepaths::Array{String, 1}, outfile::String)
Write contents of all given files to a single .fasta file.
#
NextGenSeqUtils.maxfreq
— Function.
maxfreq(vec)
Return the frequency of the most common element in vec
.
#
NextGenSeqUtils.freq
— Function.
freq(vec, elem)
Return the frequency of given element in given array; if the element is not present, return 0.0.
#
NextGenSeqUtils.sorted_freqs
— Function.
sorted_freqs(vec)
Return tuples of (freq, elem) of unique elements of vec
in order of decreasing frequency.
#
NextGenSeqUtils.freq_dict_print
— Function.
freq_dict_print(dictin; thresh=0)
Prints frequency:element of elements of dictin
above given threshold, where dictin
is a proportionmap of elements (see proportionmap
in StatsBase).