Floating-Point Accuracy - 2025.1 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-05-29
Version
2025.1 English

Depending on the chosen accuracy for floating-point operation (low | fast | safe), the precision in terms of Unit of Last Precision (ULP) (last correct bit) is different. The precision dictates the number of operations to be executed by the processor. The floating-point addition relies on the floating-point adder that exists in the hardware right after the bfloat16 vector multiplier.

Single precision floating-point (fp32) values have an 8-bit exponent and a 23-bit mantissa with an implicit heading 1 for normal numbers:

  • Maximum positive value: (2-2-23)x2127~=3.403 x 1038
  • Minimum positive value: 2-126~=1.175x10-38

bfloat16 numbers still have this 8-bit exponent but the mantissa is reduced to 7 bits with an implicit heading 1. An fp32 value is translated into the addition of 3 bfloat16 values. Unfortunately this translation is not exact for some extremely low values because the exponent of the bfloat16cannot be low enough.

Based on the precision of the floating-point operation, the following table specifies the ULP and the corresponding code that is executed.

These ULPs are given for X and Y values such that X, Y, and X*Y have an exponent in the range [-102, +126]. This is equivalent to an FP32 magnitude in the range [1.97e-31 ; 1.70e+38]

Table 1. Precision, ULP, and Corresponding Code
Precision ULP Range or ULP frequency Executed Assembly Code
low 6 to 11
VMUL.f bmh2, x4, x6, r1
VMAC.f bml3, bmh2, x7, x3, r1
VMAC.f bmh3, bml3, x7, x6, r1
fast 0 : 56.11%

1 : 37.68%

2 : 5.36%

3 : 0.83%

4 : 0.02%

VMUL.f bmh3, x7, x5, r1
VMAC.f bml4, bmh3, x3, x2, r1
VMAC.f bmh4, bml4, x8, x4, r1
VMAC.f bml5, bmh4, x3, x8, r1
VMAC.f bmh5, bml5, x7, x2, r1
VMAC.f bml6, bmh5, x7, x8, r1
safe 0 : 99.11%

1 : 0.89%

2-2 : 5.8e-4%

VMUL.f bmh3, x4, x8, r1
VMAC.f bml4, bmh3, x4, x2, r1
VMAC.f bmh4, bml4, x3, x8, r1
VMAC.f bml5, bmh4, x5, x8, r1
VMAC.f bmh5, bml5, x3, x2, r1
VMAC.f bml8, bmh5, x7, x4, r1
VMAC.f bml6, bml8, x3, x7, r1
VMAC.f bml7, bml6, x5, x2, r1
VMAC.f bmh6, bml7, x5, x7, r1

The higher the precision, the greater the number of operations to be executed to achieve it, which can reduce the compute performance.