2. The functionality of the CUs - 2023.1 English
Vitis Libraries
- Release Date
- 2023-12-20
- Version
- 2023.1 English
- The
loadCol
CU reads the input dense column vector and the NNZ column pointer entries from two physically separated DDR device memories DDR0 and DDR1 as shown in the figure above, and send them to the bufTransColVec
and bufTransNnzCol
CUs to buffer and select entries for each computation path connected to each HBM channel.
- The
bufTransColVec
CU reads the input dense vector entries that belong to each block, split them into chuncks for each HBM channel, buffer all those chunks (16 in total in this design) and transmit the data to its corresponding xBarCol
CU.
- The
bufTransNnzCol
CU reads the column pointer entries that belong to each block, split them into chuncks for each HBM channels, buffer all those chunks (16 in total in this design) and transmit the data to its corresponding xBarCol
CU.
- The
xBarCol
CUs, one for each HBM channel, select the input dense vector entries according to the NNZs’ column pointer entries and send the result to cscRow
CUs for computations.
- Each
cscRow
CU reads the value and row indices of NNZs from one HBM channel and multiplies the values with their corresponding column entries received from the connected xBarCol
CU, and accumulates the results along the row indices.
- Each
readWriteHbm
CU connects to 8 HBM channels, and reads the NNZs’ value and row indices from those connected HBM channels and send the results to the corresponding cscRow
CUs. It also collects the results from 8 cscRow
CUs and writes them back to the corresponding HBM channels. In total, 2 readWriteHbm
CUs are used to connect to 16 HBM channels.