Throughput - Throughput - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

The burst_cnt variable determines the number of samples processed during each function call. The inner loop processes eight samples per iteration, so the total number of processed samples is burst_cnt * 8.

The throughput is obtained as follows (see api_thruput.xlsx):

  • Build and run the design.

  • Open aiesimulator_output/default.aierun_summary.

  • Get the Total Function + Descendants Time (cycles) for the main function (num_cycles).

  • Throughput = clk_freq (burst_cnt 8)/num_cycles.

The throughput with a 1 GHz clock for different values of burst_cnt are as follows:

IIR Throughput (with API)

burst_cnt

1

8

16

32

64

128

256

num_samples

8

64

128

256

512

1024

2048

num_cycles (API)

187

492

940

1836

3628

7212

14379

API Throughput (Msa/sec)

42.78

130.08

136.17

139.43

141.12

141.99

142.43

*clk_freq: 1 GHz

The AI Engine APIs are a header-only implementation that acts as a “buffer” between you and the low-level intrinsics (LLI) to increase the level of abstraction.

We modify the kernel code to use low-level intrinsics (LLI).