1. The functionality of the CUs - 2024.2 English

Vitis Libraries

Release Date
2024-11-29
Version
2024.2 English
  • The loadCol CU reads the input dense column vector and the NNZ column pointer entries from two physically separated DDR device memories DDR0 and DDR1 as shown in the preceding figure, and send them to the bufTransColVec and bufTransNnzCol CUs to buffer and select entries for each computation path connected to each HBM channel.
  • The bufTransColVec CU reads the input dense vector entries that belong to each block, split them into chunks for each HBM channel, buffer all those chunks (16 in total in this design) and transmit the data to its corresponding xBarCol CU.
  • The bufTransNnzCol CU reads the column pointer entries that belong to each block, split them into chunks for each HBM channels, buffer all those chunks (16 in total in this design) and transmit the data to its corresponding xBarCol CU.
  • The xBarCol CUs, one for each HBM channel, select the input dense vector entries according to the NNZs’ column pointer entries and send the result to cscRow CUs for computations.
  • Each cscRow CU reads the value and row indices of NNZs from one HBM channel and multiplies the values with their corresponding column entries received from the connected xBarCol CU, and accumulates the results along the row indices.
  • Each readWriteHbm CU connects to eight HBM channels, and reads the NNZs’ value and row indices from those connected HBM channels and send the results to the corresponding cscRow CUs. It also collects the results from eight cscRow CUs and writes them back to the corresponding HBM channels. In total, two readWriteHbm CUs are used to connect to 16 HBM channels.