Matrix Multiplication Using DSP58 Implementation Architecture - Matrix Multiplication Using DSP58 Implementation Architecture - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

In this design, matrix multiplication is implemented using a DSP58 systolic array of size 32x32. This means that there are 32 DSP58 cascade chains, and each chain has 32 DSP58s. Thus, the 32x32 matrix is the basic matrix multiplication size. Larger matrices are broken down into submatrices of size 32x32.

Basic 32x32 multiplication is performed as follows:

  1. Matrix A row data moves upwards along DSP A Port cascade chain.

  2. For the first 32 clocks, data is only shifted into DSP chains.

  3. After 32 clocks, row 0 of matrix A is populated in the first DSP cascade chain.

  4. Row 1 is populated in the next cascade chain and so on.

This following figure illustrated this process.

Image of Matrix A data movement