The low precision GEMM operations are highly useful in AI applications, where the precision requirements can be traded with performance. In DNN applications element-wise operations, such as adding bias, clipping the output, ReLU, and GeLU are performed on the GEMM output which are referred here as post-operations (post-ops).
In LPGEMM, these post-ops are fused with the GEMM operation to avoid repeated access to memory and thereby, improving the performance. In the LPGEMM APIs, an additional argument is added for the user to provide information about the post-ops needed to perform after the GEMM operation. The supported post operations are captured in below table.
Post-op |
Description |
|---|---|
Bias |
Adds bias to the GEMM output before storing into C, where the bias data is passed by the user using the post-op interface. |
ReLU |
Performs ReLU operation on GEMM output. f(x) = 0, when x<=0 and f(x)=x when x>0. |
PReLU |
Performs Parametric ReLU operation on GEMM output based on scale given by the user. f(x) = x, when x > 0 and f(x) = scale * x when x <= 0. |
Sigmoid |
Sigmoid Weighted Linear Unit (SiLU). Sigmoid(x) = 1 / (1 + exp(-x)) |
SWISH |
Sigmoid Weighted Linear Unit (SiLU) when beta=1. SWISH(x) = x * sigmoid(beta * x) |
Tanh |
Perform Tanh on GEMM output - Tanh(x) |
GeLU_Tanh |
Perform Tanh based GeLU on GEMM output. GeLU_Tanh(x) = 0.5 * x * (1 + tanh(0.797884 * (x + (0.044715 * x^3)))) |
GeLU_ERF |
Perform Erf based GeLU on GEMM output. GeLU_Erf(x) = 0.5 * x * (1 + erf(x * 0.707107)) |
Scale |
Perform Scale operation on GEMM output based on the scale provided by the user. |
Clip |
Perform clip operation on GEMM output based on minimum and maximum values given by the user. |
Matrix Add |
Perform element-wise addition of a given D matrix to GEMM output C. C := (beta * C + alpha * A * B ) + D |
Matrix Mul |
Perform element-wise multiplication of a given D matrix with GEMM output C. C := (beta * C + alpha * A * B ) * D |
The following structures are used to instruct the API about the pre/post operations to apply and in what order in a sequence of operations.