API
The main functions and types in Rifraf.jl.
Functions
#
Rifraf.rifraf
— Function.
rifraf(dnaseqs, phreds; kwargs...)
Find a consensus sequence for a set of DNA sequences.
Returns an instance of RifrafResult
.
Arguments
dnaseqs::Vector{DNASeq}
: reads for which to find a consensusphreds::Vector{Vector{Phred}}
: Phred scores fordnaseqs
consensus::DNASeq=DNASeq()
: initial consensus; if not given, defaults to the sequence indnaseqs
with the lowest mean error ratereference::DNASeq=DNASeq()
: reference for frame correctionparams::RifrafParams=RifrafParams()
Sequence simulations
#
Rifraf.sample_sequences
— Function.
sample_sequences(nseqs, len; kwargs...)
Generate a template and sample simulated reads and Phred scores.
This function is meant for simple testing and benchmarking, and is not meant to represent a realistic error model.
Arguments:
nseqs::Int=3
: number of reads to generatelen::Int=90
: length of templateref_error_rate::Prob=0.1
: reference error rateref_errors::ErrorModel=ErrorModel(10, 0, 0, 1, 1)
: reference error modelerror_rate::Prob=0.01
: read error ratealpha::Float64=0.1
: α parameter for beta distribution of per-base template error rates.phred_scale::Float64=1.5
: λ parameter for exponential distribution of Phred erroractual_std::Float64=3.0
: σ^2 for true Gaussian errors in the Phred domainreported_std::Float64=1.0
: σ^2 for Gaussian errors in the Phred domainseq_errors::ErrorModel=ErrorModel(1, 5, 5)
: sequencing error model
Returns:
reference::DNASeq
: reference sequence fortemplate
template::DNASeq
: template sequencet_p::Vector{Prob}
: template error probabilitiesseqs::Vector{DNASeq}
: simulated readsactual::Vector{Vector{Prob}}
: error probabilitiesphreds::{Vector{Vector{Phred}}
: Phred valuesseqbools::Vector{Vector{Bool}}
:seqbools[i][j]
istrue
ifseqs[i][j]
was correctly sequenced from the templatetbools::Vector{Vector{Bool}}
:tbools[i][j]
istrue
iftemplate[j]
was correctly sequenced inseqs[i]
#
Rifraf.write_samples
— Function.
Write template into FASTA and sequences into FASTQ.
#
Rifraf.read_samples
— Function.
Read template from FASTA and sequences from FASTQ.
Utility IO functions
Rifraf.jl provides some utility functions for reading and writing FASTQ and FASTA files. This functionality uses BioSequences.jl.
#
Rifraf.read_fastq_records
— Function.
read_fastq_records(filename)
Read a FASTQ file and return records.
Returns:
records::Vector{FASTQ.Record}
#
Rifraf.read_fastq
— Function.
read_fastq(filename)
Read a FASTQ file and convert to a given sequence type.
Returns:
seqs::Vector{T}
:phreds::Vector{Vector{Phred}}
: Phred valuesnames::Vector{String}
: sequence names
#
Rifraf.write_fastq
— Function.
write_fastq(filename, seqs, phreds; names)
Write sequences to a FASTA file.
Arguments:
filename
: file into which to writeseqs
: sequences to writephreds
: corresponding Phred scoresnames::Vector{String}
: optional list of corresponding names
#
Rifraf.read_fasta_records
— Function.
read_fasta_records(filename)
Read a FASTA file and return records.
Returns:
records::Vector{FASTA.Record}
#
Rifraf.read_fasta
— Function.
read_fasta(filename)
Read a FASTA file and convert to a given sequence type.
Returns:
seqs::Vector{T}
#
Rifraf.write_fasta
— Function.
write_fasta(filename, seqs; names)
Write sequences to a FASTA file.
Arguments:
filename
: file into which to writeseqs
: sequences to writenames::Vector{String}
: optional list of corresponding names
Types
#
Rifraf.RifrafParams
— Type.
The parameters for a RIFRAF run.
Fields
scores::Scores = Scores(ErrorModel(1.0, 2.0, 2.0, 0.0, 0.0))
ref_scores::Scores = Scores(ErrorModel(10.0, 1e-1, 1e-1, 1.0, 1.0))
ref_indel_mult::Score = 3.0
: multiplier for single indel penalties in alignment with the referencemax_ref_indel_mults::Int = 5
: maximum multiplier increases for single indel penaltyref_error_mult::Float64 = 1.0
: multiplier for estimated reference error rate.do_init::Bool = true
: enable initialization stagedo_frame::Bool = true
: enable frame correction stagedo_refine::Bool = true
: enable refinement stagedo_score::Bool = false
: enable scoring stagedo_alignment_proposals::Bool = true
: only propose changes that occur in pairwise alignmentsseed_indels::Bool = true
: seed indel locations from the alignment to referenceindel_correction_only::Bool = true
: only propose indels during frame correction stageuse_ref_for_qvs::Bool = false
: use reference alignment when estimating quality scoresbandwidth::Int = (3 * CODON_LENGTH)
: alignment bandwidthbandwidth_pvalue::Float64 = 0.1
: p-value for increasing bandwidthmin_dist::Int = (5 * CODON_LENGTH)
: distance between accepted candidate proposalsbatch_fixed::Bool = true
: use top sequences for initial stage and frame correctionbatch_fixed_size::Int = 5
: size of fixed batchbatch_size::Int = 20
: batch size; if <= 1, no batching is used-
batch_randomness::Float64 = 0.9
: batch randomness0
: top n get picked0.5
: weight according to estimated errors1
: completely randombatch_mult::Float64 = 0.7
: multiplier to reduce batch randomnessbatch_threshold::Float64 = 0.1
: score threshold for increasing batch sizemax_iters::Int = 100
: maximum total iterations across all stages before giving up-
verbose::Int = 0
: verbosity level -
0
: nothing 1
: print iteration and score2
: also print step within each iteration3
: also print full consensus sequence
#
Rifraf.RifrafResult
— Type.
RifrafResult()
The result of a RIFRAF run.
Fields
consensus::DNASeq
: the consensus found by RIFRAF.params::RifrafParams
: the parameters used for this run.state::RifrafState
: the final state of the run.consensus_stages::Vector{Vector{DNASeq}}
:error_probs::EstimatedProbs
: estimated per-base probabilities for each position. Only available ifparams.do_score
istrue
.aln_error_probs::Vector{Float64}
: combined per-base error probabilities. Only available ifparams.do_score
istrue
.
#
Rifraf.ErrorModel
— Type.
ErrorModel(mismatch, insertion, deletion, codon_insertion, codon_deletion)
Error model for sequencing.
Each field contains the relative rate of of that kind of error. For instance, this model breaks the error rate into 80% mismatches, 10% codon insertions, and 10% codon deletions: ErrorModel(8, 0, 0, 1, 1)
.
Fields:
mismatch::Real
insertion::Real
deletion::Real
codon_insertion::Real
codon_deletion::Real
#
Rifraf.Scores
— Type.
Scores(errors; mismatch, insertion, deletion)
Derive alignment scores from an error model.
Takes extra penalties to add to the mismatch, insertion, and deletion scores.
Arguments:
errors::ErrorModel
:mismatch::Real
: substitutioninsertion::Real
: insertiondeletion::Real
: deletion