AOCL-DLP is built around optimized matrix operations, primarily General Matrix Multiplication (GEMM):
\[C = \text{post_ops} (\alpha \cdot op(A) \cdot op(B) + \beta \cdot C)\]
Where:
\(op(X)\) can be \(X\) (no transpose) or \(X^T\) (transpose)
\(\alpha, \beta\) are scalar multipliers
\(\text{post_ops}\) represents fused post-processing operations