Adapting for Single-Precision Floating-Point - 2024.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English

AMD Versal™ Core Adaptive SoCs primarily contain a variant of AI Engine processor which has single-precision floating-point as a native data type. A single-precision floating-point format, known as binary32, is specified by the IEEE 754 standard as shown in Figure 4.

figure4

Figure 4 - IEEE 754 Single-Precision Floating-Point Format

This format is structurally similar to double-precision format, but with reduced dynamic range and precision due to fewer bits being used to represent the exponent and mantissa. To adapt exponential function approximation to single-precision floating-point data types, the equation becomes

$$ I = \left\lfloor \frac{2^{23}}{log(2)} \left( y - log(2) F(y_f) \right) + 2^{23}x_0 \right\rfloor $$

where $x_0 = 127$ and \(I\) is computed as a signed 32-bit integer which is then reinterpreted as single-precision floating-point.

The correction function $F(y_f)$ may be approximated with a polynomial. As an example, the polynomial $p(x) = p_4 x^4 + p_3 x^3 + p_2 x^2 + p_1 x + p_0$ with $p_4 = -1.367030945e^{-2}\(, \)p_3 = -5.174499750e^{-2}\(, \)p_2 = -2.416043580e^{-1}\(, \)p_1 = 3.070270717e^{-1}\(, and \)p_0 = -3.492907808e^{-6}$ was obtained through Chebyshev approximation. Polynomials with degree greater than 4 appear to offer no additional benefit when computing with single precision. Utilizing this polynomial as the correction function results in approximation of the exponential function having a maximum error of less than 0.0015%.

For comparison, another version of Softmax Function Tutorial is available for the AIE-ML variant of AI Engines, where bfloat16 is the native floating-point data type.