Many neural networks generate zero activations between layers due to the use of
the ReLU
function, which simply clamps the negative
values to zero. So, these arbitrary zeros are introduced by the ReLU
function in the input and output activations. These zeroes are not
transmitted over the AXI4-Streams because of
decompression/compression logic added to the AIE-ML tile
and the AIE-ML Mem DMAs. The zero-valued data weights can
also be compressed offline – moving from external memory in a compressed state and
decompressed by the S2MM channel of the tile DMA.
Moreover, AIE-ML cores support on-the-fly decompression during data loading by inserting zero weights before performing convolutions. This feature ensures that only non-zero weights need to be stored in local tile memories.
Compression of activations in AXI4-Streams and on-the-fly decompression of weights during core loads are optional features. The compression/decompression in AIE-ML memory and AIE-ML tile DMA is controlled by a dedicated bit in the BD. The two primary use cases are shown in the following figure.
The following figure illustrates a compression algorithm designed to eliminate zeros in both weights and activations. This algorithm works with 8-bit data samples and employs a 32-bit mask to encode zero and non-zero bytes within a 256-bit word. Zero bytes are represented as 0 in the bit mask, while non-zero bytes are represented as 1. Zero-valued bytes are omitted from the compressed data. Guard bits are inserted as necessary to ensure 32-bit alignment for the subsequent mask. This compression process is consistently applied to all 256-bit data words.