

API

The main functions and types in Rifraf.jl.

Functions

# Rifraf.rifraf — Function.

rifraf(dnaseqs, phreds; kwargs...)

Find a consensus sequence for a set of DNA sequences.

Returns an instance of RifrafResult.

Arguments

dnaseqs::Vector{DNASeq}: reads for which to find a consensus
phreds::Vector{Vector{Phred}}: Phred scores for dnaseqs
consensus::DNASeq=DNASeq(): initial consensus; if not given, defaults to the sequence in dnaseqs with the lowest mean error rate
reference::DNASeq=DNASeq(): reference for frame correction
params::RifrafParams=RifrafParams()

source

Sequence simulations

# Rifraf.sample_sequences — Function.

sample_sequences(nseqs, len; kwargs...)

Generate a template and sample simulated reads and Phred scores.

This function is meant for simple testing and benchmarking, and is not meant to represent a realistic error model.

Arguments:

nseqs::Int=3: number of reads to generate
len::Int=90: length of template
ref_error_rate::Prob=0.1: reference error rate
ref_errors::ErrorModel=ErrorModel(10, 0, 0, 1, 1): reference error model
error_rate::Prob=0.01: read error rate
alpha::Float64=0.1: α parameter for beta distribution of per-base template error rates.
phred_scale::Float64=1.5: λ parameter for exponential distribution of Phred error
actual_std::Float64=3.0: σ^2 for true Gaussian errors in the Phred domain
reported_std::Float64=1.0: σ^2 for Gaussian errors in the Phred domain
seq_errors::ErrorModel=ErrorModel(1, 5, 5): sequencing error model

Returns:

reference::DNASeq: reference sequence for template
template::DNASeq: template sequence
t_p::Vector{Prob}: template error probabilities
seqs::Vector{DNASeq}: simulated reads
actual::Vector{Vector{Prob}}: error probabilities
phreds::{Vector{Vector{Phred}}: Phred values
seqbools::Vector{Vector{Bool}}: seqbools[i][j] is true if seqs[i][j] was correctly sequenced from the template
tbools::Vector{Vector{Bool}}: tbools[i][j] is true if template[j] was correctly sequenced in seqs[i]

source

# Rifraf.write_samples — Function.

Write template into FASTA and sequences into FASTQ.

source

# Rifraf.read_samples — Function.

Read template from FASTA and sequences from FASTQ.

source

Utility IO functions

Rifraf.jl provides some utility functions for reading and writing FASTQ and FASTA files. This functionality uses BioSequences.jl.

# Rifraf.read_fastq_records — Function.

read_fastq_records(filename)

Read a FASTQ file and return records.

Returns:

records::Vector{FASTQ.Record}

source

# Rifraf.read_fastq — Function.

read_fastq(filename)

Read a FASTQ file and convert to a given sequence type.

Returns:

seqs::Vector{T}:
phreds::Vector{Vector{Phred}}: Phred values
names::Vector{String}: sequence names

source

# Rifraf.write_fastq — Function.

write_fastq(filename, seqs, phreds; names)

Write sequences to a FASTA file.

Arguments:

filename: file into which to write
seqs: sequences to write
phreds: corresponding Phred scores
names::Vector{String}: optional list of corresponding names

source

# Rifraf.read_fasta_records — Function.

read_fasta_records(filename)

Read a FASTA file and return records.

Returns:

records::Vector{FASTA.Record}

source

# Rifraf.read_fasta — Function.

read_fasta(filename)

Read a FASTA file and convert to a given sequence type.

Returns:

seqs::Vector{T}

source

# Rifraf.write_fasta — Function.

write_fasta(filename, seqs; names)

Write sequences to a FASTA file.

Arguments:

filename: file into which to write
seqs: sequences to write
names::Vector{String}: optional list of corresponding names

source

Types

# Rifraf.RifrafParams — Type.

The parameters for a RIFRAF run.

Fields

scores::Scores = Scores(ErrorModel(1.0, 2.0, 2.0, 0.0, 0.0))
ref_scores::Scores = Scores(ErrorModel(10.0, 1e-1, 1e-1, 1.0, 1.0))
ref_indel_mult::Score = 3.0: multiplier for single indel penalties in alignment with the reference
max_ref_indel_mults::Int = 5: maximum multiplier increases for single indel penalty
ref_error_mult::Float64 = 1.0: multiplier for estimated reference error rate.
do_init::Bool = true: enable initialization stage
do_frame::Bool = true: enable frame correction stage
do_refine::Bool = true: enable refinement stage
do_score::Bool = false: enable scoring stage
do_alignment_proposals::Bool = true: only propose changes that occur in pairwise alignments
seed_indels::Bool = true: seed indel locations from the alignment to reference
indel_correction_only::Bool = true: only propose indels during frame correction stage
use_ref_for_qvs::Bool = false: use reference alignment when estimating quality scores
bandwidth::Int = (3 * CODON_LENGTH): alignment bandwidth
bandwidth_pvalue::Float64 = 0.1: p-value for increasing bandwidth
min_dist::Int = (5 * CODON_LENGTH): distance between accepted candidate proposals
batch_fixed::Bool = true: use top sequences for initial stage and frame correction
batch_fixed_size::Int = 5: size of fixed batch
batch_size::Int = 20: batch size; if <= 1, no batching is used
batch_randomness::Float64 = 0.9: batch randomness
- 0: top n get picked
- 0.5: weight according to estimated errors
- 1: completely random
- batch_mult::Float64 = 0.7: multiplier to reduce batch randomness
- batch_threshold::Float64 = 0.1: score threshold for increasing batch size
- max_iters::Int = 100: maximum total iterations across all stages before giving up
- verbose::Int = 0: verbosity level
- 0: nothing
- 1: print iteration and score
- 2: also print step within each iteration
- 3: also print full consensus sequence

source

# Rifraf.RifrafResult — Type.

RifrafResult()

The result of a RIFRAF run.

Fields

consensus::DNASeq: the consensus found by RIFRAF.
params::RifrafParams: the parameters used for this run.
state::RifrafState: the final state of the run.
consensus_stages::Vector{Vector{DNASeq}}:
error_probs::EstimatedProbs: estimated per-base probabilities for each position. Only available if params.do_score is true.
aln_error_probs::Vector{Float64}: combined per-base error probabilities. Only available if params.do_score is true.

source

# Rifraf.ErrorModel — Type.

ErrorModel(mismatch, insertion, deletion, codon_insertion, codon_deletion)

Error model for sequencing.

Each field contains the relative rate of of that kind of error. For instance, this model breaks the error rate into 80% mismatches, 10% codon insertions, and 10% codon deletions: ErrorModel(8, 0, 0, 1, 1).

Fields:

mismatch::Real
insertion::Real
deletion::Real
codon_insertion::Real
codon_deletion::Real

source

# Rifraf.Scores — Type.

Scores(errors; mismatch, insertion, deletion)

Derive alignment scores from an error model.

Takes extra penalties to add to the mismatch, insertion, and deletion scores.

Arguments:

errors::ErrorModel:
mismatch::Real: substitution
insertion::Real: insertion
deletion::Real: deletion

source