4.7.5. Interface Reference for Post-Operations - 5.2 English - 57404

AOCL User Guide (57404)

Document ID
57404
Release Date
2025-12-29
Version
5.2 English

The low precision GEMM operations are highly useful in AI applications, where the precision requirements can be traded with performance. In DNN applications element-wise operations, such as adding bias, clipping the output, ReLU, and GeLU are performed on the GEMM output which are referred here as post-operations (post-ops).

In LPGEMM, these post-ops are fused with the GEMM operation to avoid repeated access to memory and thereby, improving the performance. In the LPGEMM APIs, an additional argument is added for the user to provide information about the post-ops needed to perform after the GEMM operation. The supported post operations are captured in below table.

Table 4.19 Supported Post-ops#

Post-op

Description

Bias

Adds bias to the GEMM output before storing into C, where the bias data is passed by the user using the post-op interface.

ReLU

Performs ReLU operation on GEMM output. f(x) = 0, when x<=0 and f(x)=x when x>0.

PReLU

Performs Parametric ReLU operation on GEMM output based on scale given by the user. f(x) = x, when x > 0 and f(x) = scale * x when x <= 0.

Sigmoid

Sigmoid Weighted Linear Unit (SiLU). Sigmoid(x) = 1 / (1 + exp(-x))

SWISH

Sigmoid Weighted Linear Unit (SiLU) when beta=1. SWISH(x) = x * sigmoid(beta * x)

Tanh

Perform Tanh on GEMM output - Tanh(x)

GeLU_Tanh

Perform Tanh based GeLU on GEMM output. GeLU_Tanh(x) = 0.5 * x * (1 + tanh(0.797884 * (x + (0.044715 * x^3))))

GeLU_ERF

Perform Erf based GeLU on GEMM output. GeLU_Erf(x) = 0.5 * x * (1 + erf(x * 0.707107))

Scale

Perform Scale operation on GEMM output based on the scale provided by the user.

Clip

Perform clip operation on GEMM output based on minimum and maximum values given by the user.

Matrix Add

Perform element-wise addition of a given D matrix to GEMM output C. C := (beta * C + alpha * A * B ) + D

Matrix Mul

Perform element-wise multiplication of a given D matrix with GEMM output C. C := (beta * C + alpha * A * B ) * D

The following structures are used to instruct the API about the pre/post operations to apply and in what order in a sequence of operations.