An add-on in AOCL-BLAS provides additional APIs, operations, and/or implementations that may be useful to certain users. It can be a standalone extension of AOCL-BLAS that does not depend on any other add-on, although add-ons may utilize existing functionality or kernels within the core framework.
An add-on should never provide APIs that conflict with the interfaces belonging to the BLIS typed or object API. Thus, a properly constructed/functioning add-on would never interfere with or change the core BLIS functionality or the standard BLAS and CBLAS APIs.
Low Precision GEMM (LPGEMM) APIs are added as an add-on feature with the name aocl_gemm in AOCL-BLAS 4.1 which are used in Inference of Deep Neural Networks (DNN) applications. For example, Low Precision DNN uses the input as image pixels that are unsigned 8-bit (u8) and quantized pre-trained weights of signed 8-bits (s8) width. They produce signed 32-bit or downscaled/ quantized 8-bit output.
At the same time, these APIs are expected to utilize the architecture features such as AVX512VNNI instructions designed to take the inputs in u8, s8; produce an output in s32 and produce high throughput. Similarly, AVX512BF16 based instructions expects input in Brain Floating Point (bfloat16) type to provide higher throughput with less precision than 32-bit.