A sparse matrix is a special type of matrix where the majority of its elements
are zero. The sparsity of a matrix is determined by the ratio of zero elements to
the total number of elements. AIE-ML and
AIE-ML v2 provide hardware support for
sparse matrices. For a matrix multiplication C=A*B, matrix B can be
sparse. AI Engine API
aie::mmul supports sparse matrix
multiplication, and are described in detail in the Matrix Multiplication section in
the
AI
Engine API User Guide (UG1529).
The sparse matrix multiplication requires the matrix sparsity to be exactly
50%. If you have more than 50% zeros in your data, you must leave some of them in
the final compressed matrix. For data types that are smaller or equal to 8 bits, to
be considered sparse, every four consecutive bytes require at least two zeros. For
data type that are larger than 8 bits, every four consecutive data samples (for
example, int16) require at least two zero data
samples.
The sparse data can be compressed in 256-bit chunks and stored in data memory. The first 32 bits are mask bits. Each mask bit corresponds to one byte in the original data sequentially. If the byte is zero, the mask bit is 0. If the byte is non-zero, the corresponding bit is 1, and the byte is appended to the compressed data. After the 256 bits data is compressed, guard bits are added to make sure that the compressed data is 32-bit aligned. Following is an example showing how a sparse data is compressed:
The AI Engine API
aie::sparse_vector_input_buffer_stream reads
and decompresses the compressed data stream into aie::sparse_vector. The vector aie::sparse_vector can be used in aie::mmul to do sparse matrix multiplication. For more information
about the sparse data format and API aie::sparse_vector_input_buffer_stream usage, see Memory in the
AI
Engine API User Guide (UG1529).
sparseB) has already been compressed in the
producer (for example, PL
kernel):#include <aie_api/aie.hpp>
#include <aie_api/utils.hpp>
const int M=4;
const int K=16;
const int N=8;
alignas(aie::vector_decl_align) const int8 dataA[64]={......};
template<int SIZE>
__attribute__ ((noinline)) void sparse_mmul(
input_buffer<int8,extents<SIZE>> & __restrict sparseB,
output_buffer<int8,extents<SIZE>> & __restrict out){
auto outIter=aie::begin_vector<32>(out);
using MMUL = aie::mmul<M, K, N, int8, int8>;
aie::vector<int8,32> matA=aie::load_v<64>(dataA);
//sparseB is the input data, which should already have been compressed
auto vbs = aie::sparse_vector_input_buffer_stream<int8, 128>((int8*)sparseB.data());
......
MMUL m;
for(int i=0;i<MAX_ITER;i++){
aie::sparse_vector<int8,128> matB = vbs.pop();
//sparse matrix multiply
m.mul(matA,matB);
*outIter++=m.to_vector<int16>(0);
}
}