Throughput and Latency - Throughput and Latency - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

Throughput is measured in Tera Term Operations Per Second (TOPS). After the host app completes writing matrices A and B, it drives the start signal to DUT. When DUT is done it drives the Done output. A performance counter increments for all the clocks from Start to Done. This counts the number of clocks for which DUT is active.

For the 32x32x32 configuration, two 32x32x32 matrix multiplications are done. For each matrix, 64K MAC operations are performed, giving a total 64K * 2 = 128K MACs. If performance counter reaches value X, that means at operating frequency of 350 MHz (period of 2.857 ns), total time taken by DUT = 2.857 x X ns

Thus TOPS = 128K MACs / (2.857 x X) ns

For the rest of the configurations, one matrix multiplication is done.

Configuration

MACs

TOPS Calculation

64x64x64

512K

512K MACs / (2.857 x X) ns

128x128x128

4096K

4096K MACs / (2.857 x X) ns

256x256x256

32768K

32768K MACs / (2.857 x X) ns

512x512x512

262144K

262144K MACs / (2.857 x X) ns

1024x1024x1024

2097152K

2097152K MACs / (2.857 x X) ns