Analyzing AI Engine Performance in Simulation

Analyzing AI Engine Performance in Simulation - 2021.2 English

Versal ACAP System Integration and Validation Methodology Guide (UG1388)

Document ID

UG1388

Release Date

2021-11-19

Version

2021.2 English

After the application architecture is finalized and the design partitioning between the AI Engine and PL is complete, the next step is to develop the AI Engine application. You can use the aiesimulator to measure application performance using trace profiling features. You can also measure the performance using the aiesimulator output.

Following is an example in which performance is measured as (end-time - start-time)/number of samples where:

Each line represents a 64b number (2 cint16s). There are 51200 64b numbers (for example, 102400 32b samples).
Throughput = 102400/(182452500 ps - 5790 ns) Samples/s = 579.636 MSps.

     1 T 5790 ns
     2 0 0 0 0
     3 T 5792500 ps
     4 0 0 0 0
     5 T 5795 ns
     6 0 0 0 0
     7 T 5797500 ps
     8 0 0 0 0
     9 T 5800 ns
...
...
...
102495 -5107 2007 -32768 -18047
102496 T 182450 ns
102497 -25374 -19023 3957 3067
102498 T 182452500 ps
102499 TLAST
102500 18230 14818 11355 -5427

To further analyze AI Engine performance bottlenecks, Xilinx recommends running the aiesimulator or hardware emulation with AI Engine trace and profile options. You can open the run summary file generated for the simulation run which includes the trace and profile data in the Vitis Analyzer. This generates trace and profile views which helps you identify performance root causes. For more information, see Performance Analysis of AI Engine Graph Application during Simulation in the AI Engine User Guide (UG1076).

In addition, you can obtain detailed profiling data on AI Engine graph bandwidth, throughput, and latency using the AI Engine Run-Time Event API. For more information, see Run-Time Event API for Performance Profiling in the AI Engine User Guide (UG1076).