API
The main functions and types in Rifraf.jl.
Functions
#
Rifraf.rifraf — Function.
rifraf(dnaseqs, phreds; kwargs...)
Find a consensus sequence for a set of DNA sequences.
Returns an instance of RifrafResult.
Arguments
dnaseqs::Vector{DNASeq}: reads for which to find a consensusphreds::Vector{Vector{Phred}}: Phred scores fordnaseqsconsensus::DNASeq=DNASeq(): initial consensus; if not given, defaults to the sequence indnaseqswith the lowest mean error ratereference::DNASeq=DNASeq(): reference for frame correctionparams::RifrafParams=RifrafParams()
Sequence simulations
#
Rifraf.sample_sequences — Function.
sample_sequences(nseqs, len; kwargs...)
Generate a template and sample simulated reads and Phred scores.
This function is meant for simple testing and benchmarking, and is not meant to represent a realistic error model.
Arguments:
nseqs::Int=3: number of reads to generatelen::Int=90: length of templateref_error_rate::Prob=0.1: reference error rateref_errors::ErrorModel=ErrorModel(10, 0, 0, 1, 1): reference error modelerror_rate::Prob=0.01: read error ratealpha::Float64=0.1: α parameter for beta distribution of per-base template error rates.phred_scale::Float64=1.5: λ parameter for exponential distribution of Phred erroractual_std::Float64=3.0: σ^2 for true Gaussian errors in the Phred domainreported_std::Float64=1.0: σ^2 for Gaussian errors in the Phred domainseq_errors::ErrorModel=ErrorModel(1, 5, 5): sequencing error model
Returns:
reference::DNASeq: reference sequence fortemplatetemplate::DNASeq: template sequencet_p::Vector{Prob}: template error probabilitiesseqs::Vector{DNASeq}: simulated readsactual::Vector{Vector{Prob}}: error probabilitiesphreds::{Vector{Vector{Phred}}: Phred valuesseqbools::Vector{Vector{Bool}}:seqbools[i][j]istrueifseqs[i][j]was correctly sequenced from the templatetbools::Vector{Vector{Bool}}:tbools[i][j]istrueiftemplate[j]was correctly sequenced inseqs[i]
#
Rifraf.write_samples — Function.
Write template into FASTA and sequences into FASTQ.
#
Rifraf.read_samples — Function.
Read template from FASTA and sequences from FASTQ.
Utility IO functions
Rifraf.jl provides some utility functions for reading and writing FASTQ and FASTA files. This functionality uses BioSequences.jl.
#
Rifraf.read_fastq_records — Function.
read_fastq_records(filename)
Read a FASTQ file and return records.
Returns:
records::Vector{FASTQ.Record}
#
Rifraf.read_fastq — Function.
read_fastq(filename)
Read a FASTQ file and convert to a given sequence type.
Returns:
seqs::Vector{T}:phreds::Vector{Vector{Phred}}: Phred valuesnames::Vector{String}: sequence names
#
Rifraf.write_fastq — Function.
write_fastq(filename, seqs, phreds; names)
Write sequences to a FASTA file.
Arguments:
filename: file into which to writeseqs: sequences to writephreds: corresponding Phred scoresnames::Vector{String}: optional list of corresponding names
#
Rifraf.read_fasta_records — Function.
read_fasta_records(filename)
Read a FASTA file and return records.
Returns:
records::Vector{FASTA.Record}
#
Rifraf.read_fasta — Function.
read_fasta(filename)
Read a FASTA file and convert to a given sequence type.
Returns:
seqs::Vector{T}
#
Rifraf.write_fasta — Function.
write_fasta(filename, seqs; names)
Write sequences to a FASTA file.
Arguments:
filename: file into which to writeseqs: sequences to writenames::Vector{String}: optional list of corresponding names
Types
#
Rifraf.RifrafParams — Type.
The parameters for a RIFRAF run.
Fields
scores::Scores = Scores(ErrorModel(1.0, 2.0, 2.0, 0.0, 0.0))ref_scores::Scores = Scores(ErrorModel(10.0, 1e-1, 1e-1, 1.0, 1.0))ref_indel_mult::Score = 3.0: multiplier for single indel penalties in alignment with the referencemax_ref_indel_mults::Int = 5: maximum multiplier increases for single indel penaltyref_error_mult::Float64 = 1.0: multiplier for estimated reference error rate.do_init::Bool = true: enable initialization stagedo_frame::Bool = true: enable frame correction stagedo_refine::Bool = true: enable refinement stagedo_score::Bool = false: enable scoring stagedo_alignment_proposals::Bool = true: only propose changes that occur in pairwise alignmentsseed_indels::Bool = true: seed indel locations from the alignment to referenceindel_correction_only::Bool = true: only propose indels during frame correction stageuse_ref_for_qvs::Bool = false: use reference alignment when estimating quality scoresbandwidth::Int = (3 * CODON_LENGTH): alignment bandwidthbandwidth_pvalue::Float64 = 0.1: p-value for increasing bandwidthmin_dist::Int = (5 * CODON_LENGTH): distance between accepted candidate proposalsbatch_fixed::Bool = true: use top sequences for initial stage and frame correctionbatch_fixed_size::Int = 5: size of fixed batchbatch_size::Int = 20: batch size; if <= 1, no batching is used-
batch_randomness::Float64 = 0.9: batch randomness0: top n get picked0.5: weight according to estimated errors1: completely randombatch_mult::Float64 = 0.7: multiplier to reduce batch randomnessbatch_threshold::Float64 = 0.1: score threshold for increasing batch sizemax_iters::Int = 100: maximum total iterations across all stages before giving up-
verbose::Int = 0: verbosity level -
0: nothing 1: print iteration and score2: also print step within each iteration3: also print full consensus sequence
#
Rifraf.RifrafResult — Type.
RifrafResult()
The result of a RIFRAF run.
Fields
consensus::DNASeq: the consensus found by RIFRAF.params::RifrafParams: the parameters used for this run.state::RifrafState: the final state of the run.consensus_stages::Vector{Vector{DNASeq}}:error_probs::EstimatedProbs: estimated per-base probabilities for each position. Only available ifparams.do_scoreistrue.aln_error_probs::Vector{Float64}: combined per-base error probabilities. Only available ifparams.do_scoreistrue.
#
Rifraf.ErrorModel — Type.
ErrorModel(mismatch, insertion, deletion, codon_insertion, codon_deletion)
Error model for sequencing.
Each field contains the relative rate of of that kind of error. For instance, this model breaks the error rate into 80% mismatches, 10% codon insertions, and 10% codon deletions: ErrorModel(8, 0, 0, 1, 1).
Fields:
mismatch::Realinsertion::Realdeletion::Realcodon_insertion::Realcodon_deletion::Real
#
Rifraf.Scores — Type.
Scores(errors; mismatch, insertion, deletion)
Derive alignment scores from an error model.
Takes extra penalties to add to the mismatch, insertion, and deletion scores.
Arguments:
errors::ErrorModel:mismatch::Real: substitutioninsertion::Real: insertiondeletion::Real: deletion