Onion

Documentation for Onion.

Onion.AdaLN
Onion.Attention
Onion.Bottleneck
Onion.DecoderBlock
Onion.DyT
Onion.EncoderBlock
Onion.FlexibleUNet
Onion.GaussianFourierProjection
Onion.RMSNorm
Onion.ResidualBlock
Onion.RoPE
Onion.StarGLU
Onion.TimeEmbedding
Onion.TransformerBlock
Onion.glut
Onion.reverse_tuple

Onion.AdaLN — Type

AdaLN(dim::Int, cond_dim::Int)

Adaptive Layer Normalization.

aln = AdaLN(5, 3)
h = randn(Float32, 5,10,1)
cond = randn(Float32, 3,1)
h = aln(h, cond)

source

Onion.Attention — Type

Attention(dim::Int, n_heads::Int, n_kv_heads=n_heads; qkv_bias=false)

Attention layer for GQAttention (as in Llama3).

dim = 64
n_heads = 8
n_kv_heads = 4

attn = Attention(dim, n_heads, n_kv_heads)

source

Onion.Bottleneck — Type

Bottleneck(channels::Int; time_emb=false, emb_dim=256, dropout=0.0, activation=relu)

A bottleneck block for UNet architecture with optional time embeddings and dropout.

Arguments

channels::Int: Number of input and output channels
time_emb=false: Whether to use time embeddings
emb_dim=256: Dimension of time embeddings
dropout=0.0: Dropout probability (0.0 means no dropout)
activation=relu: Activation function to use

Examples

bn = Bottleneck(256, time_emb=true, emb_dim=256, dropout=0.2)
h = randn(Float32, 8, 8, 256, 1)
t = randn(Float32, 256, 1)
h = bn(h, t)

source

Onion.DecoderBlock — Type

DecoderBlock(in_channels::Int, out_channels::Int; time_emb=false, emb_dim=256, dropout=0.0, activation=relu)

A decoder block for UNet architecture with optional time embeddings and dropout.

Arguments

in_channels::Int: Number of input channels
out_channels::Int: Number of output channels
time_emb=false: Whether to use time embeddings
emb_dim=256: Dimension of time embeddings
dropout=0.0: Dropout probability (0.0 means no dropout)
activation=relu: Activation function to use

Examples

dec = DecoderBlock(256, 128, time_emb=true, emb_dim=256, dropout=0.1)
h = randn(Float32, 8, 8, 256, 1)
skip = randn(Float32, 16, 16, 128, 1)
t = randn(Float32, 256, 1)
h = dec(h, skip, t)

source

Onion.DyT — Method

DyT(dim::Integer; init_alpha::T = 0.5f0)

Make a Dynamic Tanh (DyT) layer for normalizing the input tensor.

See Transformers without Normalization for more details.

source

Onion.EncoderBlock — Type

EncoderBlock(in_channels::Int, out_channels::Int; time_emb=false, emb_dim=256, dropout=0.0, activation=relu)

An encoder block for UNet architecture with optional time embeddings and dropout.

Arguments

in_channels::Int: Number of input channels
out_channels::Int: Number of output channels
time_emb=false: Whether to use time embeddings
emb_dim=256: Dimension of time embeddings
dropout=0.0: Dropout probability (0.0 means no dropout)
activation=relu: Activation function to use

Examples

enc = EncoderBlock(3, 64, time_emb=true, emb_dim=256, dropout=0.1)
h = randn(Float32, 32, 32, 3, 1)
t = randn(Float32, 256, 1)
skip, h = enc(h, t)

source

Onion.FlexibleUNet — Type

FlexibleUNet(;
    in_channels=3,
    out_channels=3,
    depth=3,
    base_channels=64,
    channel_multipliers=[1, 2, 4],
    time_embedding=false,
    num_classes=0,
    embedding_dim=128,
    time_emb_dim=256,
    dropout=0.0,
    dropout_depth=0,
    activation=relu
)

A flexible UNet architecture with configurable depth and channel dimensions. Supports optional time and class embeddings for diffusion models and conditional generation.

Arguments

in_channels=3: Number of input channels
out_channels=3: Number of output channels
depth=3: Number of encoder/decoder blocks
base_channels=64: Base channel dimension (multiplied at each level)
channel_multipliers=[1, 2, 4]: Multipliers for channel dimensions at each level
time_embedding=false: Whether to use time embeddings
num_classes=0: Number of class labels for conditional generation
embedding_dim=128: Dimension for class embeddings
time_emb_dim=256: Dimension for time embeddings
dropout=0.0: Dropout probability to apply to inner layers
dropout_depth=0: Number of layers to apply dropout to, starting from the innermost layers (0 means no dropout). Maximum value is 1+depth (bottleneck + all encoding/decoding levels)
activation=relu: Activation function to use throughout the network

Examples

# Basic model without dropout
model = FlexibleUNet(
    in_channels=3,
    out_channels=3,
    depth=4,
    base_channels=32,
    channel_multipliers=[1, 2, 4, 8],
    time_embedding=true
)

# Model with dropout applied to the 3 innermost layers
model = FlexibleUNet(
    in_channels=3,
    out_channels=3,
    depth=4,
    base_channels=32,
    channel_multipliers=[1, 2, 4, 8],
    time_embedding=true,
    dropout=0.2,
    dropout_depth=3
)

x = randn(Float32, 32, 32, 3, 1)
t = randn(Float32, 1)
labels = [5]
y = model(x, t, labels)

source

Onion.GaussianFourierProjection — Type

GaussianFourierProjection(embed_dim::Int, scale::T=32.0f0)

Creates a Gaussian Fourier feature projection for time embeddings. Used in diffusion models.

Arguments

embed_dim::Int: Embedding dimension. Should be even.
scale::T=32.0f0: Scaling factor for the random weights.

source

Onion.RMSNorm — Type

RMSNorm(dim::Int; eps::T=1f-5)

Root Mean Square Layer Normalization. As used in Llama3.

source

Onion.ResidualBlock — Type

ResidualBlock(channels::Int; kernel_size=3, time_emb=false, emb_dim=256, dropout=0.0, activation=relu)

A ResNet-style residual block with optional time embeddings, dropout, and configurable activation.

Arguments

channels::Int: Number of input and output channels
kernel_size=3: Size of convolutional kernel
time_emb=false: Whether to use time embeddings
emb_dim=256: Dimension of time embeddings
dropout=0.0: Dropout probability (0.0 means no dropout)
activation=relu: Activation function to use (e.g., relu, swish, etc.)

Examples

# Basic block with dropout
rb = ResidualBlock(64, dropout=0.1)

# Block with time embeddings and custom activation
rb = ResidualBlock(64, time_emb=true, emb_dim=256, dropout=0.1, activation=swish)

# Usage
h = randn(Float32, 32, 32, 64, 1)
t = randn(Float32, 256, 1)
h = rb(h, t)

source

Onion.RoPE — Type

RoPE(dim::Int, max_length; theta::T=10000f0)

Rotary Position Embeddings (as in Llama3).

dim = 64
n_heads = 8
n_kv_heads = 4
seqlen = 10

t = TransformerBlock(dim, n_heads, n_kv_heads)
h = randn(Float32, dim, seqlen, 1)

rope = RoPE(dim ÷ n_heads, 1000)
h = t(h, 1, rope[1:seqlen]) #Note the subsetting to match seqlen

source

Onion.StarGLU — Type

StarGLU(dim::Int, ff_hidden_dim::Int; act=Flux.swish)

Gated Linear Unit with flexible activation function (default: swish, making it a SwiGLU layer as used in Llama3).

l = StarGLU(6, 8)
h = randn(Float32, 6, 10, 1)
h = l(h)

source

Onion.TimeEmbedding — Type

TimeEmbedding(embed_dim::Int, num_classes::Int, embedding_dim::Int)

Creates time and optional class embeddings for diffusion models.

Arguments

embed_dim::Int: Output dimension for time embeddings
num_classes::Int: Number of classes for conditional generation
embedding_dim::Int: Dimension for class embeddings

Examples

time_emb = TimeEmbedding(256, 10, 128)
t = randn(Float32, 16)
labels = rand(1:10, 16)
h = time_emb(t, labels)

source

Onion.TransformerBlock — Type

TransformerBlock(dim::Int, n_heads::Int, n_kv_heads::Int = n_heads, ff_hidden_dim = 4 * dim; norm_eps=1f-5, qkv_bias=false)
TransformerBlock{Attention,FeedForward,AttentionNorm,FeedForwardNorm}

Transformer block for GQAttention (as in Llama3). No KV caching (see Jjama3.jl for KV caching).

dim = 64
n_heads = 8
n_kv_heads = 4
seqlen = 10

rope = RoPE(dim ÷ n_heads, 1000)
t = TransformerBlock(dim, n_heads, n_kv_heads)

h = randn(Float32, dim, seqlen, 1)

#Use without a mask:
h = t(h, 1, rope[1:seqlen])

#Use with a causal mask:
mask = Onion.causal_mask(h)
h = t(h, 1, rope[1:seqlen], mask)

source

Onion.glut — Method

glut(t::AbstractArray, d::Int, pos::Int)
glut(t::Real, d::Int, pos::Int) = t

glut adds dimensions to the middle. The resulting array will have d dimensions. pos is where to add the dimensions. pos=0 adds dims to the start, pos=1 after the first element, etc. If t is scalar, it is returned unmodified (because scalars don't need to match dims to broadcast).

Typically when broadcasting x .* t, you would call something like glut(t, ndims(x), 1).

source

Onion.reverse_tuple — Method

reverse_tuple(t::Tuple)

Helper function that reverses the order of elements in a tuple. Use for maintaining type stability when reversing the order of skip connections.

source