Examples

Some examples showing how to use RIFRAF.

On simulated data

First, we generate a random 1,200 bp template, along with a reference and twenty simulated reads. We run RIFRAF both without and with the reference and compare the result to the expected template.

julia> using Rifraf

julia> sampled = sample_sequences(20, 1200);

julia> (reference, template, _, sequences, _, phreds, _, _) = sampled;

julia> result = rifraf(sequences, phreds;
                       params=RifrafParams(batch_fixed_size=3, batch_size=5,
                                           verbose=1, max_iters=20));
iteration 1 : STAGE_INIT : -Inf
iteration 2 : STAGE_INIT : -66.93546689716547
done. converged: true

julia> result.consensus == template
false

julia> result = rifraf(sequences, phreds; reference=reference,
                       params=RifrafParams(verbose=1, max_iters=20));
iteration 1 : STAGE_INIT : -Inf
iteration 2 : STAGE_INIT : -112.72329868969373
iteration 3 : STAGE_FRAME : -112.72329868969373
iteration 4 : STAGE_FRAME : -287.4380940401863
iteration 5 : STAGE_REFINE : -287.4380940401863
done. converged: true

julia> result.consensus == template
true

Reading data from FASTQ files

Rifraf.jl also provides a set of convenience functions for reading and writing FASTA and FASTQ files. For instance, here is one way to run RIFRAF on sequences from a file:

sequences, phreds, names = Rifraf.read_fastq("/path/to/sequences.fastq")
reference = Rifraf.read_fasta("/path/to/reference.fasta")[1]
result = rifraf(sequences, phreds; reference=reference)

There are also convenience functions for reading and writing the output of sample_sequences.

reference, template, t_error, sequences, _, phreds, _, _ = sample_sequences()
Rifraf.write_samples("/path/to/basename", reference, template, t_error, sequences, phreds)
reference, template, t_error, sequences, phreds = Rifraf.read_samples("/path/to/basename")

Command-line script

scripts/rifraf.jl is a command-line script for processing many sets of reads at once. Julia takes some time to start up, so this script is only recommended for long sequences or large numbers of reads.

The data directory includes some example data for testing this script. The following command finds a consensus for each FASTQ file that matches the glob input-reads-*.fastq and writes them to results.fasta.

julia ./scripts/rifraf.jl \
    --reference ./data/references.fasta \
    --reference-map ./data/ref-map.tsv \
    --phred-cap 30 \
    --ref-errors 8,0.1,0.1,1,1 \
    1,2,2 \
    "./data/input-reads-*.fastq" \
    ./data/consensus-results.fasta

Allowing frameshifts during frame correction

RIFRAF's default parameters penalize frameshift-causing indels extremely heavily. If the template really does contain a frame shift mutation, it will likely be removed from the result. To detect real frameshifts, with a small risk of allowing some spurious ones, the penalties for single insertions or deletions must be tuned. For example:

julia> using Rifraf

julia> sampled = Rifraf.sample_sequences(5, 3001; error_rate=.005);

julia> (reference, template, _, sequences, _, phreds, _, _) = sampled;

julia> result = rifraf(sequences, phreds; reference=reference);

julia> length(result.consensus) % 3 == 0
true

julia> result = rifraf(sequences, phreds; reference=reference,
                       params=RifrafParams(ref_scores=Scores(ErrorModel(10, 1, 1, 1, 1)),
                                           ref_indel_mult=1.2, max_ref_indel_mults=3));

julia> length(result.consensus) % 3 == 0
false