Sparse Matrix Multiplication - 2025.1 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-05-29
Version
2025.1 English

A sparse matrix is a special type of matrix where the majority of its elements are zero. The sparsity of a matrix is determined by the ratio of zero elements to the total number of elements. AIE-ML and AIE-ML v2 provide hardware support for sparse matrices. For a matrix multiplication C=A*B, matrix B can be sparse. AI Engine API aie::mmul supports sparse matrix multiplication, and are described in detail in the Matrix Multiplication section in the AI Engine API User Guide (UG1529).

The sparse matrix multiplication requires the matrix sparsity to be exactly 50%. If you have more than 50% zeros in your data, you must leave some of them in the final compressed matrix. For data types that are smaller or equal to 8 bits, to be considered sparse, every four consecutive bytes require at least two zeros. For data type that are larger than 8 bits, every four consecutive data samples (for example, int16) require at least two zero data samples.

The sparse data can be compressed in 256-bit chunks and stored in data memory. The first 32 bits are mask bits. Each mask bit corresponds to one byte in the original data sequentially. If the byte is zero, the mask bit is 0. If the byte is non-zero, the corresponding bit is 1, and the byte is appended to the compressed data. After the 256 bits data is compressed, guard bits are added to make sure that the compressed data is 32-bit aligned. Following is an example showing how a sparse data is compressed:

Note: For AIE-ML v2, data chunks are 512-bit wide, and the mask and alignment is on 64 bits.
Figure 1. Data Compression Example Generated by Your Tool

The AI Engine API aie::sparse_vector_input_buffer_stream reads and decompresses the compressed data stream into aie::sparse_vector. The vector aie::sparse_vector can be used in aie::mmul to do sparse matrix multiplication. For more information about the sparse data format and API aie::sparse_vector_input_buffer_stream usage, see Memory in the AI Engine API User Guide (UG1529).

The following is an example code. This API assumes that the input data (sparseB) has already been compressed in the producer (for example, PL kernel):
#include <aie_api/aie.hpp>
#include <aie_api/utils.hpp>
const int M=4;
const int K=16;
const int N=8;

alignas(aie::vector_decl_align) const int8 dataA[64]={......};
template<int SIZE>
__attribute__ ((noinline)) void sparse_mmul(
   input_buffer<int8,extents<SIZE>> & __restrict sparseB,
   output_buffer<int8,extents<SIZE>> & __restrict out){

    auto outIter=aie::begin_vector<32>(out);
    using MMUL = aie::mmul<M, K, N, int8, int8>;
    aie::vector<int8,32> matA=aie::load_v<64>(dataA);

    //sparseB is the input data, which should already have been compressed
    auto vbs = aie::sparse_vector_input_buffer_stream<int8, 128>((int8*)sparseB.data());
    ......
    MMUL m;
    for(int i=0;i<MAX_ITER;i++){
      aie::sparse_vector<int8,128> matB = vbs.pop();

      //sparse matrix multiply
      m.mul(matA,matB);
      *outIter++=m.to_vector<int16>(0);
    }
}
Note: The sparse matrix is compressed or decompressed column-wise (column by column).
Note: For AIE-ML v2, both matrices should have the same data type: int8xint8, int16xint16, or bfloat16xbfloat16. Block floating-point sparse matrix multiplication are not supported.