Viewing the Run Summary in the Vitis IDE - 2024.1 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
Release Date
2024.1 English

After running the system, whether in simulation, hardware emulation, or in hardware, a run_summary report is generated when the application has been properly configured.

During simulation of the AI Engine graph, the AI Engine simulator or hardware emulation, captures performance and activity metrics and writes the report to the output directory ./aiesimulator_output and ./sim/behav_waveform/xsim. The generated summary is called default.aierun_summary.

The run_summary can be viewed in the Vitis IDE. The summary contains a collection of reports, capturing the performance profile of the AI Engine application captured as it runs. For example, to open the AI Engine simulator run summary use the following command:

vitis -a ./aiesimulator_output/default.aierun_summary

The Vitis IDE opens displaying the Summary page of the report. The tool lists the different reports that are available in the summary. For a complete understanding of the Analysis view, see Working with the Analysis View (Vitis Analyzer) in the Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393).

Note: The default.aierun_summary also contains the some of the same reports as <GRAPH_TB_FILE_NAME>.aiecompile_summary. These reports are Graph and Array. To see those reports go to the Viewing Compilation Results in the Analysis View of the Vitis Unified IDE.

Report Summary

This is the top-level of the report, and reports the details of the run, such as date, tool version, and the command-line used to launch the simulator.

Profile Summary

When the aiesimulator --profile option is specified, the simulator collects profiling data on the AI Engine graph and kernels presenting a high-level view of the AI Engine graphs, kernels-mapped to processors, with tables and graphic presentation of metric data.

The Profile Summary provides annotated details regarding the overall application performance. All data generated during the execution of the application is grouped into categories. The Profile Summary lets you examine processor/DMA memory stalls, deadlock, interference, critical paths, and maximum contention. This is useful for system-level performance tuning and debug. System performance is presented in terms of latency (number of cycles taken to execute the system) and throughput (data/time taken). Sub-optimal system performance forces you to examine and control (thru constraints) mapping and buffer packing, stream and packet switch allocation, interaction with neighboring processors, and external interfaces. An example of the raw Profile Summary report is shown.

Figure 1. Profile Summary

Specific tables can be used to see profile information specific to the kernels. This is shown as a chart with a table showing what is running on the tiles. The following is an example chart.

Figure 2. Example Chart

This view shows a chart with a Total Function Time that is the total cycles the function used in running the graph. The y-axis shows the id of the function that can be referenced in the ID column of the table below. This information can be useful in determining where time is being spent in a function and helps with potential optimization or debug. The table lists the:

  • ID of the function profiled
  • The function name
  • The number of times the function was executed
  • The total time taken in cycles to execute the function
  • The total function execution time as a percent of the total execution time of the graph
  • The total time taken in cycles to execute the function and the functions(descendents) called from within it
  • The total time as a percent to execute the function and the functions(descendents) called from within it

Trace Report

Issues such as missing or mismatching locks, buffer overruns, and incorrect programming of DMA buffers are difficult to debug using traditional interactive debug techniques. Event trace provides a systematic way of collecting system level traces for the program events, providing direct support for generation, collection, and streaming of hardware events as a trace. The following image shows the Trace report open in the Vitis IDE.

Figure 3. Trace Report
Note: This example illustrates kernel function and functions that are added by the compiler:
Core main function. This is different from the function used in the top-level file.
Kernel init function that runs once per graph execution.
Calls destructors of global C++ objects.
This section holds executable instructions that terminate the process. When a program exits normally, the system runs the code in this section.
Note: If the VCD file is too large and it takes too much time for the Vitis IDE to analyze the VCD and open the Trace view, you can do an online analysis of the VCD when running the AI Engine simulator. The IDE then opens the existing WDB and event files instead of analyzing the VCD file. The command for AI Engine simulator is as follows.
aiesimulator --pkg-dir=./Work --online -wdb -text

Features of the trace report include the following.

  • Each tile is reported. Within each tile the report includes core, DMA, locks, and I/O if there are PL blocks in the graph.
  • There is a separate timeline for each kernel mapped to a core. It shows when the kernel is executing (blue) or stalled (red) due to memory conflicts or waiting for stream data.
  • Use lock IDs in the core, DMA, and locks sections to identify how cores and DMAs interact with one another by acquiring and releasing locks.
  • The lock section shows the activities of the locks in the tile, both the allocation and release for read and write lock requests. A particular lock can be allocated by nearby tiles. Thus, this section does not necessarily match the core lock requests of the core shown in the left pane of the image.
  • If a lock is not released, a red bar extends through the end of simulation time.
  • Clicking the left or right arrows takes you to the start and end of a state, respectively.
  • The data view shows the data flowing through stream switch network with slave entry points and master exit points at each hop. This is most useful in finding the routing delays, as well as network congestion effects with packet switching, where one packet might get delayed behind another packet when sharing the same stream channel.