4.7.1. Add-on in AOCL-BLAS - 5.2 English - 57404

AOCL User Guide (57404)

Document ID
57404
Release Date
2025-12-29
Version
5.2 English

An add-on in AOCL-BLAS provides additional APIs, operations, and/or implementations that may be useful to certain users. It can be a standalone extension of AOCL-BLAS that does not depend on any other add-on, although add-ons may utilize existing functionality or kernels within the core framework.

An add-on should never provide APIs that conflict with the interfaces belonging to the BLIS typed or object API. Thus, a properly constructed/functioning add-on would never interfere with or change the core BLIS functionality or the standard BLAS and CBLAS APIs.

Low Precision GEMM (LPGEMM) APIs are added as an add-on feature with the name aocl_gemm in AOCL-BLAS 4.1 which are used in Inference of Deep Neural Networks (DNN) applications. For example, Low Precision DNN uses the input as image pixels that are unsigned 8-bit (u8) and quantized pre-trained weights of signed 8-bits (s8) width. They produce signed 32-bit or downscaled/ quantized 8-bit output.

At the same time, these APIs are expected to utilize the architecture features such as AVX512VNNI instructions designed to take the inputs in u8, s8; produce an output in s32 and produce high throughput. Similarly, AVX512BF16 based instructions expects input in Brain Floating Point (bfloat16) type to provide higher throughput with less precision than 32-bit.