Summary of Throughput and Latency for all Variations - Summary of Throughput and Latency for all Variations - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

Latency of the design is given by the perf counter value read from DUT. The performance counter measures the time taken by the DUT for matrix multiplication in terms of number of clocks.

The following table shows the latency for various matrix sizes per matrix (int16) (1x clocks):

GeMM Configuration

Data Transfer Size

Latency in

Latency (us)

Matrices/s

32x32x32

1024

34

0.097

10.29 x 10^6

64x64x64

4096

130

0.371

2.69 x 10^6

128x128x128

16384

1026

2.931

3.41 x 10^5

256x256x256

65536

8194

23.411

4.27 x 10^4

512x512x512

262144

65538

187.3

5.34 x 10^3

1024x1024x1024

1048576

524290

1497.8

6.67 x 10^2

Note: In hw_emu, due to a simulation problem expected data and read data are off by one clock.