As illustrated in the following figure, the matrix partitioning steps implemented in the software are:
- Partition the entire matrix into blocks according to the on-chip row and column buffer sizes, shown as “on-chip row buffer size” and “on-chip col buffer size” in the figure. The “on-chip col buffer size” and the “on-chip row buffer size” can be defined at hardware compile time by macro
SPARSE_maxColMemBlocks
andSPARSE_maxRowBlocks
. For the Alveo U280 card, the following fomula shows how to compute the number of rows and columns in each on-chip matrix block.
number of columns in each block = SPARSE_maxColMemBlocks * 16 number of rows in each block = SPARSE_maxRowBlocks * 4
- Partition each block evenly into chunks along the column. The number of chunks are decided at hardware compile time by macro
SPARSE_hbmChannels
. In this design 16 HBM channels are used. - According to their HBM channel ID, these data chunks are assembled into different host memory regions, which will be migrated to different HBM channels on the device during runtime. For example, as shown in the figure above, the
red
data chunks in each block will be assembled into one memory block and migrated to HBM channel 0 on the device.
The matrix block partition information is stored in the DDR and HBM channels. The loadCol
and readWriteHbm
CUs will decode this information and retrieve the data correspondingly. As shown in the figure above, there are following three sections in each device memory.
Parameter summary section. This section is used to store number of parameter descriptions. The size (number of bytes) of this section is defined by macro
SPARSE_paramOffset
, which is 1024 in the preceding figure.Parameter section. This section is used to store the parameter descriptions of data blocks. Each parameter description normally includes the address offset, the number of parallelly processed matrix/vector entries, the min/max indices in the blocks and so on.
Data section. This section is used to store matrix and vector data. The data information of DDR and HBM device memories is given below.
- DDR0: Dense input vector data. Each DDR access produces 16 FP32 data entries.
- DDR1: Column pointers of the NNZs in a sparse matrix. Each DDR access produces 16 column pointer values for 16 NNZs.
- HBM channels: Row indices and values of the NNZs in a sparse matrix. Each access of one single HBM channels produces 4 values and 4 row indices data for 4 NNZs.