To estimate design performance during the AI Engine simulation, analyze the profile results. This section explains topics that are most commonly used to assess hoverall kernel performance.
Refer to the Section 4 Enabling Profile and Trace Options to enable profiling in the Vitis IDE.
After running the AI Engine Simulation, open the profile in analysis view -> aie_component -> AIE SIMULATOR/HARDWARE -> Run-aie_component -> Profile.
Click the Summary for each tile on the landing page, and review the cycle count, instruction count, and program memory.
Under Function Reports, click Total Function Time to view the following table at the bottom for the
data_shufflekernel function.The
data_shufflekernel function took 2,303 cycles for seven iterations, that is, ~329 cycles for one iteration, which is the Avg Function Time.The
mainfunction is added by the compiler and different from themain()function in thegraph.cppfile. This function took 99749 cycles in total, which includes the time to transfer control back and forth between each graph iteration, lock stalls, and so on.The
_main_initruns once for all graph iterations, and it took 26 cycles.The
_cxa_finalizefunction took 43 cycles to call the destructors of the global c++ objects.The
_finifunction executes the program terminating instructions, and it took 24 cycles.
If you click the AI Engine Simulation Summary, you see the AI Engine Frequency as
1250 MHz, that is,0.8ns, that is,1cycle =0.8 nsNow, the data_shuffle function took329cycles for1iteration, that is,329 × 0.8 ~= 264 ns.Match these values with trace information. Click Trace and zoom into the period of one iteration (between two
main()function calls as follows). Add a marker and drag it to the end of the kernel function.The difference between the start time and end time of the kernel function for one iteration matches 264 ns from the profiling results.