Design Performance Debug - Design Performance Debug - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

To estimate design performance during the AI Engine simulation, analyze the profile results. This section explains topics that are most commonly used to assess hoverall kernel performance.

Refer to the Section 4 Enabling Profile and Trace Options to enable profiling in the Vitis IDE.

  1. After running the AI Engine Simulation, open the profile in analysis view -> aie_component -> AIE SIMULATOR/HARDWARE -> Run-aie_component -> Profile.

  2. Click the Summary for each tile on the landing page, and review the cycle count, instruction count, and program memory.

  3. Under Function Reports, click Total Function Time to view the following table at the bottom for the data_shuffle kernel function. profile function time

    • The data_shuffle kernel function took 2,303 cycles for seven iterations, that is, ~329 cycles for one iteration, which is the Avg Function Time.

    • The main function is added by the compiler and different from the main() function in the graph.cpp file. This function took 99749 cycles in total, which includes the time to transfer control back and forth between each graph iteration, lock stalls, and so on.

    • The _main_init runs once for all graph iterations, and it took 26 cycles.

    • The _cxa_finalize function took 43 cycles to call the destructors of the global c++ objects.

    • The _fini function executes the program terminating instructions, and it took 24 cycles.

  4. If you click the AI Engine Simulation Summary, you see the AI Engine Frequency as 1250 MHz, that is, 0.8ns, that is, 1 cycle = 0.8 ns Now, the data_shuffle function took 329 cycles for 1 iteration, that is, 329 × 0.8 ~= 264 ns.

  5. Match these values with trace information. Click Trace and zoom into the period of one iteration (between two main() function calls as follows). Add a marker and drag it to the end of the kernel function. trace function time The difference between the start time and end time of the kernel function for one iteration matches 264 ns from the profiling results.