PL Kernel Details - PL Kernel Details - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

GeMM DSP RTL design can be divided into two main parts:

  • Core matrix multiplication functionality in which the gemm_top module is the top level module that implements this functionality.

  • Data mover logic for writing Matrix A and B data and to read the matrix output from host application. This is implemented in the ps_slave module.

In this design, core DSP logic operates at 700 MHz while rest of the logic operates at 350 MHz. There is a synchronizer module to handle the synchronization of signals going across these two clock domains

 gemm_large_ocm \
 |-gemm_top \
 |-ps_slave \
 |-synchronizer

Under the gemm_top module, the following modules are instantiated:

Module

Description

FIXGEMM_WRAPPER

Implements the systolic array of 1K DSP58 engines

row_uram

URAMs which store Matrix A data. Entire 1Kx1K matrix A is stored in URAMs

col_uram

URAMs which store Matrix B data. Entire 1Kx1K matrix B is stored in URAMs

partial_sum_bram

64 partial sum block RAMs (512 x 64) to store the partial sum

op_uram

URAMs that store the final output of the matrix multiplication

DSP_data_controller

Controls input data to DSP58 array and output from DSP58 array

control_logic

Controls writes/reads to/from URAMs

Underneath FIXGEMM_WRAPPER, FIXGEMM entity is instantiated, and underneath this there are DSP_GW instantiations.