A sparse matrix is a special type of matrix where the majority of its elements
are zero. The sparsity of a matrix is determined by the ratio of zero elements to
the total number of elements. AI Engine-ML provides hardware support for sparse matrices.
For a matrix multiplication C=A*B, matrix B
can be
sparse. AI Engine API
aie::mmul
supports sparse matrix
multiplication, and are described in detail in the Matrix Multiplication section in the
AI
Engine API User Guide (UG1529).
The sparse matrix multiplication requires the matrix sparsity to be a
minimum of 50%. For data types that are smaller or equal to 8 bits, to be considered
sparse, every four consecutive bytes require at least two zeros. For data type that
are larger than 8 bits, every four consecutive data samples (for example, int16
) require at least two zero data samples.
The sparse data can be compressed in 512-bit chunks and stored in data memory. The first 64 bits are mask bits. Each mask bit corresponds to one byte in the original data sequentially. If the byte is zero, the mask bit is 0. If the byte is non-zero, the corresponding bit is 1, and the byte is appended to the compressed data. After the 512 bits data is compressed, guard bits are added to make sure that the compressed data is 32-bit aligned. Following is an example showing how a sparse data is compressed:
The AI Engine API aie::sparse_vector_input_buffer_stream
reads and decompresses the
compressed data stream into aie::sparse_vector
. The
vector aie::sparse_vector
can be used in aie::mmul
to do sparse matrix multiplication. For more
information about the sparse data format and API aie::sparse_vector_input_buffer_stream
usage, see Memory in the
AI
Engine API User Guide (UG1529)
sparseB
) has already been compressed in the
producer (for example, PL
kernel):#include <aie_api/aie.hpp>
#include <aie_api/aie_adf.hpp>
#include <aie_api/utils.hpp>
using namespace adf;
const int M=2;
const int K=16;
const int N=8;
alignas(aie::vector_decl_align) const int16 dataA[32]={......};
template<int SIZE>
__attribute__ ((noinline)) void sparse_mmul(
input_buffer<int8,extents<SIZE>> & __restrict sparseB,
output_buffer<int16,extents<SIZE>> & __restrict out){
auto outIter=aie::begin_vector<16>(out);
using MMUL = aie::mmul<M, K, N, int16, int8>;
aie::vector<int16,32> matA=aie::load_v<32>(dataA);
//sparseB is the input data, which should already have been compressed
auto vbs = aie::sparse_vector_input_buffer_stream<int8, 128>((int8*)sparseB.data());
......
MMUL m;
for(int i=0;i<MAX_ITER;i++){
aie::sparse_vector<int8,128> matB = vbs.pop();
//sparse matrix multiply
m.mul(matA,matB);
*outIter++=m.to_vector<int16>(0);
}
}