GeMM DSP RTL design can be divided into two main parts:
Core matrix multiplication functionality in which the gemm_top module is the top level module that implements this functionality.
Data mover logic for writing Matrix A and B data and to read the matrix output from host application. This is implemented in the ps_slave module.
In this design, core DSP logic operates at 700 MHz while rest of the logic operates at 350 MHz. There is a synchronizer module to handle the synchronization of signals going across these two clock domains
gemm_large_ocm \
|-gemm_top \
|-ps_slave \
|-synchronizer
Under the gemm_top module, the following modules are instantiated:
Module |
Description |
|---|---|
FIXGEMM_WRAPPER |
Implements the systolic array of 1K DSP58 engines |
row_uram |
URAMs which store Matrix A data. Entire 1Kx1K matrix A is stored in URAMs |
col_uram |
URAMs which store Matrix B data. Entire 1Kx1K matrix B is stored in URAMs |
partial_sum_bram |
64 partial sum block RAMs (512 x 64) to store the partial sum |
op_uram |
URAMs that store the final output of the matrix multiplication |
DSP_data_controller |
Controls input data to DSP58 array and output from DSP58 array |
control_logic |
Controls writes/reads to/from URAMs |
Underneath FIXGEMM_WRAPPER, FIXGEMM entity is instantiated, and underneath this there are DSP_GW instantiations.