Examples
Some examples showing how to use RIFRAF.
On simulated data
First, we generate a random 1,200 bp template, along with a reference and twenty simulated reads. We run RIFRAF both without and with the reference and compare the result to the expected template.
julia> using Rifraf
julia> sampled = sample_sequences(20, 1200);
julia> (reference, template, _, sequences, _, phreds, _, _) = sampled;
julia> result = rifraf(sequences, phreds;
params=RifrafParams(batch_fixed_size=3, batch_size=5,
verbose=1, max_iters=20));
iteration 1 : STAGE_INIT : -Inf
iteration 2 : STAGE_INIT : -66.93546689716547
done. converged: true
julia> result.consensus == template
false
julia> result = rifraf(sequences, phreds; reference=reference,
params=RifrafParams(verbose=1, max_iters=20));
iteration 1 : STAGE_INIT : -Inf
iteration 2 : STAGE_INIT : -112.72329868969373
iteration 3 : STAGE_FRAME : -112.72329868969373
iteration 4 : STAGE_FRAME : -287.4380940401863
iteration 5 : STAGE_REFINE : -287.4380940401863
done. converged: true
julia> result.consensus == template
true
Reading data from FASTQ files
Rifraf.jl also provides a set of convenience functions for reading and writing FASTA and FASTQ files. For instance, here is one way to run RIFRAF on sequences from a file:
sequences, phreds, names = Rifraf.read_fastq("/path/to/sequences.fastq")
reference = Rifraf.read_fasta("/path/to/reference.fasta")[1]
result = rifraf(sequences, phreds; reference=reference)
There are also convenience functions for reading and writing the output of sample_sequences.
reference, template, t_error, sequences, _, phreds, _, _ = sample_sequences()
Rifraf.write_samples("/path/to/basename", reference, template, t_error, sequences, phreds)
reference, template, t_error, sequences, phreds = Rifraf.read_samples("/path/to/basename")
Command-line script
scripts/rifraf.jl is a command-line script for processing many sets of reads at once. Julia takes some time to start up, so this script is only recommended for long sequences or large numbers of reads.
The data directory includes some example data for testing this script. The following command finds a consensus for each FASTQ file that matches the glob input-reads-*.fastq and writes them to results.fasta.
julia ./scripts/rifraf.jl \
--reference ./data/references.fasta \
--reference-map ./data/ref-map.tsv \
--phred-cap 30 \
--ref-errors 8,0.1,0.1,1,1 \
1,2,2 \
"./data/input-reads-*.fastq" \
./data/consensus-results.fasta
Allowing frameshifts during frame correction
RIFRAF's default parameters penalize frameshift-causing indels extremely heavily. If the template really does contain a frame shift mutation, it will likely be removed from the result. To detect real frameshifts, with a small risk of allowing some spurious ones, the penalties for single insertions or deletions must be tuned. For example:
julia> using Rifraf
julia> sampled = Rifraf.sample_sequences(5, 3001; error_rate=.005);
julia> (reference, template, _, sequences, _, phreds, _, _) = sampled;
julia> result = rifraf(sequences, phreds; reference=reference);
julia> length(result.consensus) % 3 == 0
true
julia> result = rifraf(sequences, phreds; reference=reference,
params=RifrafParams(ref_scores=Scores(ErrorModel(10, 1, 1, 1, 1)),
ref_indel_mult=1.2, max_ref_indel_mults=3));
julia> length(result.consensus) % 3 == 0
false