DSP - 2024.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English

In this design, Matrix multiplication is implemented using Systolic array of 1024 DSP58 Engines. There are 32 DSP58 cascade chains, each chain has 32 DSP58s. Matrix-Matrix multiplication is decomposed into Matrix-Vector multiplication. One Matrix B column vector is multiplied by each row of Matrix A. This is achieved by broadcasting Matrix B column vector to DSPs at the same position in each cascade chain, while all 1K elements of Matrix A are read and each element drives one Port A of DSP58. One cascade chain implements one column vector and one row vector multiplication. This operation completes in 32 clocks.

Thus 32x32 matrix is the basic matrix multiplication unit. Larger matrices are broken down into submatrices of size 32x32, and each 32x32 submatrix of Matrix A is multiplied with each submatrix of Matrix B. For larger matrix multiplication, partial sum needs to be stored, read back, added to the new value and stored back.

Directory Structure