API Reference

Onion.AdaAffine — Type

AdaAffine(f, dim, cond_dim; zero_init, bias = false)

Adaptive Affine layer, using a secondary conditioning input to scale and shift the output of the wrapped layer.

Equivalent to AdaLN when wrapping a LayerNorm layer, and AdaLN-Zero when initialized with zero_init=true.

source

Onion.AdaLN — Type

AdaLN(dim::Int, cond_dim::Int)

Adaptive Layer Normalization.

See also AdaAffine.

aln = AdaLN(5, 3)
h = randn(Float32, 5,10,1)
cond = randn(Float32, 3,1)
h = aln(h, cond)

source

Onion.AngleResnet — Type

AngleResnet(c_in, c_hidden, no_blocks, no_angles, epsilon)

Predicts (unnormalized, normalized) torsion angles from two sequence representations.

source

Onion.AngleResnetBlock — Type

AngleResnetBlock(c_hidden)

2-layer residual block: relu → linear → relu → linear + residual.

source

Onion.Attention — Type

Attention(
    in_dim::Int, n_heads::Int, n_kv_heads=n_heads;
    head_dim=in_dim÷n_heads, qkv_bias=false,
    q_norm=identity, k_norm=identity,
    out_init_scale=1,
)

Attention layer that supports both self-attention and cross-attention (as in Llama3).

Examples

Self-attention

in_dim = 256
n_heads = 8
n_kv_heads = 4
head_dim = 64
attn = Attention(in_dim, n_heads, n_kv_heads; head_dim)

seq_len = 10
batch = 2
x = randn(in_dim, seq_len, batch)
output = attn(x)

source

Onion.AttentionPairBias — Type

AttentionPairBias(c_s, c_z, num_heads; compute_pair_bias=true, use_qk_norm=false)

Wraps Attention with pair-bias projection from pairwise state z.

(apb)(s, z, mask, k_in=s)

s: sequence state (C_s, L, B)
z: pairwise state (C_z, L, L, B) (or pre-computed bias if compute_pair_bias=false)
mask: key padding mask (L, B), 1 = valid
k_in: cross-attention key source (defaults to s)

source

Onion.BGTriangleMultiplication — Type

BGTriangleMultiplication(dim; outgoing=true)

BoltzGen-style triangle multiplication with fused projection × sigmoid gate.

source

Onion.BackboneUpdate — Type

BackboneUpdate(c_s)

Linear projection to 6D vector (3 quaternion update + 3 translation update).

source

Onion.BlockLinear — Type

BlockLinear(
    d1 => d2, k;
    bias::Bool=true,
    init=WeightInitializers.glorot_uniform)

A block-diagonal version of a linear layer, comprising k blocks, where the blocks are of size (d2 ÷ k, d1 ÷ k).

Equivalent to Linear when k=1.

source

Onion.CrossFrameIPA — Type

CrossFrameIPA(dim::Int, ipa; ln = LayerNorm(dim))

Constructs a layer that takes one embedding, and two sets of frames. Runs layernorm on the embedding, and then makes a cross-attention IPA call with one embedding but two frames. Useful for self-conditioning where two sets of frames need to communicate with each other.

Accepts Rigid frames.

source

Onion.DART — Type

DART(transformer; mask=:causal)

"Doubly Auto-Regressive Transformer" (DART) is a convenience layer wrapping a transformer block that can be used to model auto-regressive data represented along two dimensions.

Note

The mask acts on the flattened tokens sequence.

Examples

julia> dart = DART(TransformerBlock(64, 8));

julia> x = randn(Float32, 64, 4, 20);

julia> dart(x) |> size
(64, 4, 20)

source

Onion.Derf — Type

Derf(dim; init_alpha=0.5)

Dynamic erf (Derf) layer for point-wise normalization of inputs.

See Stronger Normalization-Free Transformers for more details.

source

Onion.DyT — Type

DyT(dim; init_alpha=0.5)

Dynamic Tanh (DyT) layer for point-wise normalization of inputs.

See Transformers without Normalization for more details.

source

Onion.ESMFoldAttention — Type

ESMFoldAttention(embed_dim, num_heads, head_width; gated=false)

Fused QKV projection + attention primitive call. Returns (output, nothing) for API compatibility with ESMFold trunk.

source

Onion.ESMFoldIPA — Type

ESMFoldIPA(c_s, c_z, c_hidden, no_heads, no_qk_points, no_v_points; inf=1e5, eps=1e-8)

Invariant Point Attention from ESMFold / OpenFold. Combines scalar QK attention, 3D point distance attention, and pair bias.

source

Onion.FoldingTrunk — Type

FoldingTrunk(; cfg=FoldingTrunkConfig())

ESMFold folding trunk: recycle loop with TriangularSelfAttentionBlock stack + StructureModule.

source

Onion.Framemover — Type

Framemover(dim::Int; init_gain = 0.1f0)

Differentiable rigid body updates (AF2-style). Accepts and returns Rigid frames.

source

Onion.GeneralizedHyperConnection — Type

GeneralizedHyperConnection(n, m)
(ghc::GeneralizedHyperConnection)(layer, h::AbstractArray)

Wrap a sublayer (e.g. attention or FFN) with the static form of Generalized Hyper-Connections (GHC).

Given a backbone hidden size $D$, the over-width representation is partitioned into n segments, while the backbone operates on only m segments. This layer:

compresses an over-width state of size $\frac{n}{m}D$ down

to backbone width $D$ by projecting down the n segments into m segments,

applies layer at backbone width,
expands the backbone output back to n segments,
carries forward the previous over-width state with a projection

from n segments to n segments, adding it to the expanded backbone output.

See also VirtualWidthNetwork and With.

Examples

julia> ghc = GeneralizedHyperConnection(3, 2); # hidden width is 1.5x the backbone width

julia> h = randn(Float32, 12, 5); # hidden state is kept at 12

julia> layer = Linear(8 => 8); # backbone width is 8

julia> ghc(layer, h) |> size
(12, 5)

julia> ghc(layer, h) == ghc(h) do h
           layer(h)
       end
true

See: Virtual Width Networks

source

Onion.IPAblock — Type

IPAblock(dim::Int, ipa; ln1 = LayerNorm(dim), ln2 = LayerNorm(dim), ff = StarGLU(dim, 3dim))

For use with Invariant Point Attention, either from InvariantPointAttention.jl or MessagePassingIPA.jl. If ipablock.ipa is from InvariantPointAttention.jl, then call ipablock(frames, x; pair_feats = nothing, cond = nothing, mask = 0, kwargs...) If ipablock.ipa is from MessagePassingIPA.jl, then call ipablock(g, frames, x, pair_feats; cond = nothing) Pass in cond if you're using eg. AdaLN that takes a second argument.

Accepts Rigid frames.

source

Onion.L2Norm — Type

L2Norm(; dims=1, eps=1f-6)

Alias for LpNorm with p=2.

source

Onion.LayerNorm — Type

LayerNorm(dim::Int; eps::T=1f-6)

Layer Normalization.

ln = LayerNorm(64)
x = randn(Float32, 64, 10, 1)
y = ln(x)

source

Onion.Linear — Type

Linear(
    d1 => d2;
    bias::Bool=true,
    init=WeightInitializers.glorot_uniform
)

See also the L2Norm alias for p=2.

source

Onion.MiniTriangularUpdate — Type

MiniTriangularUpdate(dim)

Compact triangle update: splits into 4 chunks, does both outgoing and incoming combine_projections, then concatenates.

source

Onion.MiniformerLayer — Type

MiniformerLayer(token_s, token_z; num_heads=16, dropout=0.25, ...)

Single Miniformer block: uses MiniTriangularUpdate (compact, does both outgoing+incoming) instead of 4 separate triangle ops.

source

Onion.MiniformerNoSeqLayer — Type

MiniformerNoSeqLayer(token_z; dropout=0.25, ...)

Pair-only Miniformer block.

source

Onion.Modulator — Type

Modulator(in_dim => out_dim; σ=sigmoid, op=*)

Takes an input Y and a conditioning input X and applies a gate to Y based on X.

See Gated Attention for Large Language Models

Examples

julia> gate = Modulator(32 => 64);

julia> Y = randn(Float32, 64);

julia> X = randn(Float32, 32);

julia> gate(Y, X) |> size
(64,)

source

Onion.MultidimRoPE — Method

MultidimRoPE(; theta=10000f0)

Multi-dimensional Rotary Position Embedding (RoPE) for 2D, 3D, or higher-dimensional coordinate inputs. This is a fixed (non-learnable) generalization of the original RoPE from Su et al. (2021), where each rotary pair of channels is assigned to a specific coordinate dimension and rotated accordingly.

Example

dim, n_heads, n_kv_heads, seqlen = 64, 8, 4, 16
t = TransformerBlock(dim, n_heads, n_kv_heads)
h = randn(Float32, dim, seqlen, 1)
mask = 0

positions = randn(Float32, 3, seqlen, 1)
rope = MultidimRoPE(theta=10000f0)

h_out = t(h, positions, rope, mask)  # self-attention with multi-dim RoPE

source

Onion.MultimerInvariantPointAttention — Type

MultimerInvariantPointAttention(c_s, c_z, c_hidden, no_heads, no_qk_points, no_v_points; ...)

AF2 multimer IPA. Separate q/k/v projections and PointProjectionMultimer.

source

Onion.OuterProductMean — Type

OuterProductMean(c_in, c_hidden, c_out)

Outer product mean over the MSA sequence dimension. Input: m (C_in, S, N, B), mask (S, N, B). Output: (C_out, N, N, B).

source

Onion.PairToSequence — Type

PairToSequence(c_z, num_heads)

Project pairwise state to per-position attention bias. Shape: (C_z, L, L, B) → (H, L, L, B).

source

Onion.PairWeightedAveraging — Type

PairWeightedAveraging(c_m, c_z, c_h, num_heads; inf=1f6)

Pair-weighted averaging of sequence features using pairwise weights. Input: m (C_m, S, N, B), z (C_z, N, N, B), mask (N, N, B). Output: (C_m, S, N, B).

source

Onion.PairformerLayer — Type

PairformerLayer(token_s, token_z; num_heads=16, dropout=0.25, ...)

Single BoltzGen-style Pairformer block: pair track (4 triangle ops + transition) followed by sequence track (pair-biased attention + transition).

source

Onion.PairformerModule — Type

PairformerModule(token_s, token_z, num_blocks; ...)

Stack of PairformerLayers.

source

Onion.PairformerNoSeqLayer — Type

PairformerNoSeqLayer(token_z; dropout=0.25, ...)

Pair-only Pairformer block (no sequence track).

source

Onion.PointProjection — Type

PointProjection(c_hidden, num_points, no_heads)

Projects activations to 3D point clouds, then transforms to global frame via apply_rigid.

source

Onion.PointProjectionMultimer — Type

PointProjectionMultimer(c_hidden, num_points, no_heads)

Like PointProjection but with (3P, H) weight layout (split x/y/z in first dim). Used by MultimerInvariantPointAttention.

source

Onion.RMSNorm — Type

RMSNorm(dim::Int; T=Float32, eps=1f-5, zero_centered=false)

Root Mean Square Layer Normalization. As used in Llama3.

source

Onion.RelativePosition — Type

RelativePosition(bins, pairwise_state_dim)

Computes pairwise relative position embeddings from residue indices.

source

Onion.ResidueMLP — Type

ResidueMLP(dim, inner_dim; dropout=0)

Two-layer MLP with ReLU and residual connection.

source

Onion.RoPE — Type

RoPE(dim::Int, max_length; theta::T=10000f0)

Rotary Position Embeddings (as in Llama3).

dim = 64
n_heads = 8
n_kv_heads = 4
seqlen = 10

t = TransformerBlock(dim, n_heads, n_kv_heads)
h = randn(Float32, dim, seqlen, 1)

rope = RoPE(dim ÷ n_heads, 1000)
h = t(h, 1, rope[1:seqlen]) #Note the subsetting to match seqlen

source

Onion.STRINGRoPE — Type

STRINGRoPE(head_dim::Int, n_heads::Int, d_coords::Int; init_scale=0.001f0, theta=10000f0)

Multidimensional, learnable Rotary Position Embedding (RoPE) from Schneck et al. (2025), "Learning the RoPEs: Better 2D and 3D Position Encodings with STRING".

Example

head_dim = 64
n_heads = 8
d_coords = 3
rope = STRINGRoPE(head_dim, n_heads, d_coords)

x = rand(Float32, head_dim, 16, n_heads, 2)      # (head_dim, seq_len, n_heads, batch)
positions = rand(Float32, d_coords, 16, 2)       # (d_coords, seq_len, batch)
x_rot = rope(x, positions)

Note

As this needs to be learnable it should preferably be used with the STRINGBlock.

source

Onion.SequenceToPair — Type

SequenceToPair(c_s, c_inner, c_z)

Project sequence state to pairwise via outer product + difference. Shape: (C_s, L, B) → (C_z, L, L, B).

source

Onion.StarGLU — Type

StarGLU(dim::Int, ff_hidden_dim::Int; act=swish)

Gated Linear Unit with flexible activation function (default: swish, making it a SwiGLU layer as used in Llama3).

source

Onion.StructureModule — Type

StructureModule(; cfg=StructureModuleConfig())

OpenFold/ESMFold structure module. Iteratively refines backbone frames and predicts sidechain torsion angles + atom14 positions.

source

Onion.TransformerBlock — Type

TransformerBlock(dim::Int, n_heads::Int, n_kv_heads::Int = n_heads, ff_hidden_dim = 4 * dim; norm_eps=1f-5, qkv_bias=false)

Transformer block for GQAttention (as in Llama3).

dim = 64
n_heads = 8
n_kv_heads = 4
seqlen = 10

rope = RoPE(dim ÷ n_heads, 1000)
t = TransformerBlock(dim, n_heads, n_kv_heads)

h = randn(Float32, dim, seqlen, 1)

#Use without a mask:
h = t(h, 1, rope[1:seqlen])

#Use with a causal mask:
mask = Onion.causal_mask(h)
h = t(h, 1, rope[1:seqlen], mask)

source

Onion.Transition — Type

Transition(dim, hidden=4*dim; out_dim=dim)

BoltzGen-style SwiGLU feed-forward: fc3(swish(fc1(norm(x))) .* fc2(norm(x))).

source

Onion.TriangleAttention — Type

TriangleAttention(c_in, c_hidden, no_heads; starting=true)

Attention along one axis of a 4D pair tensor (C, L₁, L₂, B). When starting=true, attends along L₁ (rows); otherwise L₂ (columns).

Uses the existing Attention layer with g1_gate=Modulator(sigmoid) for gating.

source

Onion.TriangleMultiplicativeUpdate — Type

TriangleMultiplicativeUpdate(c_z, c_hidden; outgoing=true)

Triangle multiplicative update. Projects a/b with sigmoid gates, calls combine_projections(a, b, outgoing), then output gate.

source

Onion.TriangularSelfAttentionBlock — Type

TriangularSelfAttentionBlock(seq_dim, pair_dim, seq_head_width, pair_head_width; dropout=0)

The main ESMFold evoformer block. Alternates sequence attention (with pair bias) and pairwise triangle operations.

Forward: (sequence_state, pairwise_state; mask=nothing) → (sequence_state, pairwise_state)

source

Onion.With — Type

With(wrapper, layer)

Wrap layer with wrapper, calling wrapper(layer, args...; kws...).

Equivalent to Base.Fix1(wrapper, layer).

Examples

julia> model = With(GHC(3, 2), Linear(8 => 8));

julia> x = randn(Float32, 12, 5);

julia> model(x) |> size
(12, 5)

source

Onion.VirtualWidthNetwork — Method

VirtualWidthNetwork(layer, n::Int, m::Int)

Wrap a layer with a GeneralizedHyperConnection of size n and m.

Examples

julia> model = VirtualWidthNetwork(Linear(8 => 8), 3, 2);

julia> x = randn(Float32, 12, 5);

julia> model(x) |> size
(12, 5)

See: Virtual Width Networks

source

Onion.attention — Function

attention(
    q, k, v;
    causal, pair,
    q_lengths, k_lengths)

source

Onion.combine_projections — Function

combine_projections(a, b, outgoing::Bool)

Triangle multiplication contraction. a and b are (C, L, L, B) tensors. When outgoing, contracts as a @ bᵀ per channel×batch; otherwise aᵀ @ b.

source

Onion.cross_att_padding_mask — Method

cross_att_padding_mask(padmask, other_dim; T=Float32)

Takes a sequence-level padmask and a dimension other_dim and returns a cross-attention mask that is length-by-other_dim-by-batch. This prevents information flow from padded key positions to any query positions (but ignores padding in the query positions, because nothing should flow out of those).

Examples

julia> cross_att_padding_mask([1 1; 1 1; 1 0], 4)
3×4×2 Array{Float32, 3}:
[:, :, 1] =
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

[:, :, 2] =
   0.0    0.0    0.0    0.0
   0.0    0.0    0.0    0.0
 -Inf   -Inf   -Inf   -Inf

source

Onion.distogram — Method

distogram(coords, min_bin, max_bin, num_bins)

Computes Cb from N/Ca/C via cross product, then bins pairwise Cb-Cb distances.

source

Onion.falses_like — Method

falses_like(x::AbstractArray, [T=eltype(x)], [dims=size(x)])

Returns an array of falses of type Bool with an array type similar to x. The dimensions default to size(x).

falses_like(args...) is equivalent to like(false, Bool, args...)

source

Onion.glut — Method

glut(t::AbstractArray, d::Int, pos::Int)
glut(t::Real, d::Int, pos::Int) = t

glut adds dimensions to the middle. The resulting array will have d dimensions. pos is where to add the dimensions. pos=0 adds dims to the start, pos=1 after the first element, etc. If t is scalar, it is returned unmodified (because scalars don't need to match dims to broadcast).

Typically when broadcasting x .* t, you would call something like glut(t, ndims(x), 1).

source

Onion.layer_norm — Function

layer_norm(x::AbstractMatrix, w::AbstractVector, b::AbstractVector; eps)

source

Onion.like — Function

like(x::AbstractArray, array::DenseArray, T=eltype(x))

Like like(v, x::AbstractArray, args...), but an arbitrary AbstractArray, such as an AbstractRange, can be instantiated on device.

Examples

julia> like(1:5, rand(1))
5-element Vector{Int64}:
 1
 2
 3
 4
 5

julia> like((1:5)', rand(1), Float32)
1×5 Matrix{Float32}:
 1.0  2.0  3.0  4.0  5.0

source

Onion.like — Method

like(v, x::AbstractArray, [T=eltype(x)], [dims=size(x)])

Returns an array of v (converted to type T) with an array type similar to x. The element type and dimensions default to eltype(x) and size(x).

like(v, x::AbstractArray, args...) is equivalent to fill!(similar(x, args...), v), but the function is marked as non-differentiable using ChainRulesCore.

source

Onion.linear — Function

linear(x::AbstractMatrix, W::AbstractMatrix, b)

Matrix multiply with optional bias: W * x .+ b. b can be an AbstractVector or false (no bias).

source

Onion.newton_schulz — Function

newton_schulz(X, coefficients)

Quintic Newton-Schulz iteration for polar decomposition. coefficients is an iterable of (a, b, c) tuples — one per iteration. Each step applies Y = aX + bXXᵀX + cXXᵀXXᵀX (tall) or the wide variant.

source

Onion.ofeltype — Method

ofeltype(v::Number, ::AbstractArray{T}) where T = convert(T, v)

Convert v to type T.

source

Onion.ones_like — Method

ones_like(x::AbstractArray, [T=eltype(x)], [dims=size(x)])

Returns an array of ones with an array type similar to x. The element type and dimensions default to eltype(x) and size(x).

ones_like(args...) is equivalent to like(true, args...)

source

Onion.rms_norm — Function

rms_norm(x::AbstractMatrix, w::AbstractVector; eps, offset)
rms_norm(x::AbstractMatrix; eps)

source

Onion.rotary_pos_emb — Function

rotary_pos_emb(x, cos, sin)

Apply rotary positional embeddings. Splits x along dim 1 into halves and applies the rotation: [x₁·cos - x₂·sin; x₂·cos + x₁·sin].

source

Onion.self_att_padding_mask — Method

self_att_padding_mask(padmask; T=Float32)

Takes a sequence-level padmask (ie. length-by-batch, where 0 indicates a padded position) and returns a (non-causal) self-attention mask that is length-by-length-by-batch and which prevents information flow from padded positions to unpadded positions.

Examples

julia> self_att_padding_mask([1 1; 1 1; 1 0])
3×3×2 Array{Float32, 3}:
[:, :, 1] =
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 2] =
   0.0    0.0  -Inf
   0.0    0.0  -Inf
 -Inf   -Inf     0.0

source

Onion.softmax — Function

softmax(x::AbstractMatrix; dims=1)

source

Onion.trues_like — Method

trues_like(x::AbstractArray, [T=eltype(x)], [dims=size(x)])

Returns an array of trues of type Bool with an array type similar to x. The dimensions default to size(x).

trues_like(args...) is equivalent to like(true, Bool, args...)

source

Onion.zeros_like — Method

zeros_like(x::AbstractArray, [T=eltype(x)], [dims=size(x)])

Returns an array of zeros with an array type similar to x. The element type and dimensions default to eltype(x) and size(x).

zeros_like(args...) is equivalent to like(false, args...)

source

Onion.@lazy — Macro

x = @lazy y + z

Lazy broadcasting macro, for use in apply! rules. It broadcasts like @. but does not materialise, returning a Broadcasted object for later use. Beware that mutation of arguments will affect the result, and that if it is used in two places, work will be done twice.

source