Microfloat

Microfloats.Microfloat — Type

Microfloat{S,E,M,V}

A Microfloat type has S sign bits (between 0 and 1), E exponent bits (between 1 and 8), and M mantissa bits (between 0 and 7).

source

Finite

Microfloats.Finite — Type

Finite

A variant of the Microfloat type that supports finite values.

source

These types have IEEE 754-like Inf/NaN encodings, with Inf being represented as all 1s in the exponent and a significand of zero, and NaN being represented as all 1s in the exponent and a non-zero significand.

Microfloats.IEEE_754_like — Type

IEEE_754_like

source

Microfloats.Float8_E3M4 — Type

Float8_E3M4

Properties

Bits: 1 sign + 3 exponent + 4 significand (8 total)
Variant: IEEE_754_like
Has Inf: true
Has NaN: true
Max normal: 15.5
Min normal: 0.25
Max subnormal: 0.234375
Min subnormal: 0.015625

source

Microfloats.Float8_E4M3 — Type

Float8_E4M3

Properties

Bits: 1 sign + 4 exponent + 3 significand (8 total)
Variant: IEEE_754_like
Has Inf: true
Has NaN: true
Max normal: 240.0
Min normal: 0.015625
Max subnormal: 0.013671875
Min subnormal: 0.001953125

source

Microfloats.Float8_E5M2 — Type

Float8_E5M2

Properties

Bits: 1 sign + 5 exponent + 2 significand (8 total)
Variant: IEEE_754_like
Has Inf: true
Has NaN: true
Max normal: 57344.0
Min normal: 6.103515625e-5
Max subnormal: 4.57763671875e-5
Min subnormal: 1.52587890625e-5

source

Microfloats.Float6_E2M3 — Type

Float6_E2M3

Properties

Bits: 1 sign + 2 exponent + 3 significand (6 total)
Variant: IEEE_754_like
Has Inf: true
Has NaN: true
Max normal: 3.75
Min normal: 1.0
Max subnormal: 0.875
Min subnormal: 0.125

source

Microfloats.Float6_E3M2 — Type

Float6_E3M2

Properties

Bits: 1 sign + 3 exponent + 2 significand (6 total)
Variant: IEEE_754_like
Has Inf: true
Has NaN: true
Max normal: 14.0
Min normal: 0.25
Max subnormal: 0.1875
Min subnormal: 0.0625

source

Microfloats.Float4_E2M1 — Type

Float4_E2M1

Properties

Bits: 1 sign + 2 exponent + 1 significand (4 total)
Variant: IEEE_754_like
Has Inf: true
Has NaN: true
Max normal: 3.0
Min normal: 1.0
Max subnormal: 0.5
Min subnormal: 0.5

source

Microscaling (MX)

Types from Open Compute Project Microscaling Formats (MX) Specification, with MX_E5M2 adhering to the IEEE 754-like encoding of Inf/NaN, whereas MXE4M3 and MXE8M0 have no Inf, and only one representation of NaN (excluding the sign bit), and the finite types MXE3M2, MXE2M3, and MX_E2M1 which have no Inf or NaNs.

Microfloats.MX — Type