Design Performance Debug - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID
XD100
Release Date
2024-03-05
Version
2023.2 English

To estimate the design performance during the AI Engine simulation, it is necessary to analyze the profile results carefully. This section walks you through some topics that are most commonly used to assess how your kernel is performing overall.

Refer to the Section 4 Enabling the Profile and Trace Options to understand how to enable profiling in the Vitis IDE.

  1. After running the AI Engine Simulation, open the profile analysis view -> aie_component -> AIE SIMULATOR/HARDWARE -> Run-aie_component -> Profile.

  2. You can click the Summary corresponding to each tile in the landing page, and observe the cycle count, instruction count, and program memory.

  3. Now, under the Function Reports, click the Total Function Time to observe the following table at the bottom for the data_shuffle kernel function. profile function time

    • The data_shuffle kernel function took 2,303 cycles for seven iterations, i.e., ~329 cycles for one iteration which is the Avg Function Time.

    • The main function is added by the compiler and different from the main() function in the graph.cpp file. This function took 99749 cycles in total which includes the time to transfer control back and forth between each graph iteration, lock stalls, etc.

    • The _main_init runs once for all graph iterations, and it took 26 cycles.

    • The _cxa_finalize function took 43 cycles to call the destructors of the global c++ objects.

    • The _fini function executes the program terminating instructions, and it took 24 cycles.

  4. If you click the AI Engine Simulation Summary, you can notice the AI Engine Frequency as 1250 MHz, i.e., 0.8ns, i.e., 1 cycle = 0.8 ns Now, the data_shuffle function took 329 cycles for 1 iteration, i.e., 329 × 0.8 ~= 264 ns.

  5. Try to match these values with the trace information. Click Trace, and zoom into the period of one iteration (between two main() function calls as follows), and add a marker and drag it to the end of the kernel function. trace function time The difference between the starting time and end time of the kernel function for one iteration matches with the 264 ns seen in the profiling results.