Floating-Point Accuracy - 2024.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-11-28
Version
2024.2 English

Floating-Point Accuracy

Depending on the chosen accuracy for floating-point operation (low | fast | safe) the precision in terms of Unit of Last Precision (ULP) (last correct bit) is different. The precision dictates the number of operations to be executed by the processor. The floating-point addition relies on the floating-point adder that exists in the hardware right after the bfloat16 vector multiplier.

Based on the precision of the floating-point operation, the table below specifies the ULP and the corresponding code that is executed.

Table 1. Precision, ULP, and Corresponding Code
Precision ULP Executed Assembly Code
low 10
VMUL.f bmh2, x4, x6, r1
VMAC.f bml3, bmh2, x7, x3, r1
VMAC.f bmh3, bml3, x7, x6, r1
fast 5
VMUL.f bmh3, x7, x5, r1
VMAC.f bml4, bmh3, x3, x2, r1
VMAC.f bmh4, bml4, x8, x4, r1
VMAC.f bml5, bmh4, x3, x8, r1
VMAC.f bmh5, bml5, x7, x2, r1
VMAC.f bml6, bmh5, x7, x8, r1
safe 0
VMUL.f bmh3, x4, x8, r1
VMAC.f bml4, bmh3, x4, x2, r1
VMAC.f bmh4, bml4, x3, x8, r1
VMAC.f bml5, bmh4, x5, x8, r1
VMAC.f bmh5, bml5, x3, x2, r1
VMAC.f bml8, bmh5, x7, x4, r1
VMAC.f bml6, bml8, x3, x7, r1
VMAC.f bml7, bml6, x5, x2, r1
VMAC.f bmh6, bml7, x5, x7, r1

The higher the precision, the greater the number of operations to be executed to achieve it, which can reduce the compute performance.