After the application architecture is finalized and the design partitioning
between the AI Engine and PL is complete, the next
step is to develop the AI Engine application. You can
use the aiesimulator
to measure application performance using trace
profiling features. You can also measure the performance using the
aiesimulator
output.
Following is an example in which performance is measured as (end-time - start-time)/number of samples where:
- Each line represents a 64b number (2 cint16s). There are 51200 64b numbers (for example, 102400 32b samples).
- Throughput = 102400/(182452500 ps - 5790 ns) Samples/s = 579.636 MSps.
1 T 5790 ns
2 0 0 0 0
3 T 5792500 ps
4 0 0 0 0
5 T 5795 ns
6 0 0 0 0
7 T 5797500 ps
8 0 0 0 0
9 T 5800 ns
...
...
...
102495 -5107 2007 -32768 -18047
102496 T 182450 ns
102497 -25374 -19023 3957 3067
102498 T 182452500 ps
102499 TLAST
102500 18230 14818 11355 -5427
To further analyze AI Engine performance
bottlenecks, Xilinx recommends running the aiesimulator
or hardware emulation with AI Engine trace and profile options. You can open the run
summary file generated for the simulation run which includes the trace and profile data
in the Vitis Analyzer. This generates trace and
profile views which helps you identify performance root causes. For more information,
see
Performance Analysis of AI
Engine Graph Application during Simulation in the AI Engine User Guide (UG1076).
In addition, you can obtain detailed profiling data on AI Engine graph bandwidth, throughput, and latency using the AI Engine Run-Time Event API. For more information, see Run-Time Event API for Performance Profiling in the AI Engine User Guide (UG1076).