Softmax Function Definition - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2025-12-05
Version
2025.2 English

The softmax function is defined for a vector of real values \(\mathbf{z} = \left( z_1, z_2, \ldots , z_M \right)\) by the equation

\[ \Large {\sigma \left( \mathbf{z} \right) {\small i}} = {\frac{e^{z_i}}{\sum\nolimits_{j=1}^{M} e^{z_j}}} \]

where \(z_i\) are the individual outputs of the layer. Softmax differs from other popular activation functions in that it takes into account the entire layer and scales outputs so they sum to a value of 1. Each individual output can then be interpreted as a probability. So in classification problems, softmax output may be interpreted as probability that the input data belongs to a specified class.

When computing the softmax function, there is a risk of overflow occurring during evaluation of the individual exponential functions that comprise the formula. For bfloat16 floating-point numbers, the exponential function overflows when input values exceed 88.5. To avoid overflow, the softmax function is often evaluated using the equivalent formula

\[ \Large {\sigma \left( \mathbf{z} \right) {\small i}} = {\frac{e^{z_i - \alpha}}{\sum\nolimits_{j=1}^{M} e^{z_j- \alpha}}} \]

where \(\alpha\) is a real-valued constant. In particular, \(\alpha\) is often chosen to be the maximum of all \(z_i\) values comprising the input vector. By subtracting the maximum value from all others, inputs to the exponential functions are constrained to the range \((-\infty, 0]\), which in turn limits the exponential function values to the range \([0, 1]\).

Another alternative to evaluating the softmax function is to use the equivalent formula

\[ \Large {\sigma \left( \mathbf{z} \right) {\small i}} = \exp \left( z_i - \log \sum\nolimits_{j=1}^{M} e^{z_j} \right) \]

which is attractive because no division is required. However, it has been shown that in practice this formula tends to produce larger computational errors [1].