The burst_cnt variable determines the number of samples processed during each function call. The inner loop processes eight samples per iteration, so the total number of processed samples is burst_cnt * 8.
The throughput is obtained as follows (see api_thruput.xlsx):
Build and run the design.
Open
aiesimulator_output/default.aierun_summary.Get the
Total Function + Descendants Time (cycles)for themainfunction (num_cycles).Throughput =
clk_freq(burst_cnt8)/num_cycles.
The throughput with a 1 GHz clock for different values of burst_cnt are as follows:
IIR Throughput (with API)
burst_cnt |
1 |
8 |
16 |
32 |
64 |
128 |
256 |
num_samples |
8 |
64 |
128 |
256 |
512 |
1024 |
2048 |
num_cycles (API) |
187 |
492 |
940 |
1836 |
3628 |
7212 |
14379 |
API Throughput (Msa/sec) |
42.78 |
130.08 |
136.17 |
139.43 |
141.12 |
141.99 |
142.43 |
*clk_freq: 1 GHz
The AI Engine APIs are a header-only implementation that acts as a “buffer” between you and the low-level intrinsics (LLI) to increase the level of abstraction.
We modify the kernel code to use low-level intrinsics (LLI).