Viewing the Run Summary in the Vitis IDE - 2025.2 English - UG1076

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2025-11-20
Version
2025.2 English

When properly configured, the application generates a run_summary report after running the system, whether in simulation, hardware emulation, or in hardware.

During AI Engine graph simulation, the AI Engine simulator or hardware emulation, captures performance and activity metrics. The report is written to the output directory ./aiesimulator_output and ./sim/behav_waveform/xsim. The generated summary is called default.aierun_summary.

The run_summary can be viewed in the Vitis IDE. The summary contains a collection of reports, capturing the performance profile of the AI Engine application captured as it runs. For example, to open the AI Engine simulator run summary use the following command:

vitis -a ./aiesimulator_output/default.aierun_summary

The Vitis IDE opens displaying the Summary page of the report. The tool lists the different reports that are available in the summary. For a complete understanding of the Analysis view, see Working with the Analysis View (Vitis Analyzer) in the Vitis Reference Guide (UG1702).

Note: The default.aierun_summary also contains the some of the same reports as <GRAPH_TB_FILE_NAME>.aiecompile_summary. These reports are Graph and Array. For more information, see Opening AI Engine Compilation Summary Reports in the Vitis Reference Guide (UG1702).

Report Summary

This is the top-level of the report. The top-level shows details of the run, including date, tool version, and the command-line used to launch the simulator.

Profile Summary

If you specify the aiesimulator --profile option, the simulator collects profiling data on the AI Engine graph and kernels. The simulator presents a high-level view of the AI Engine graphs, kernels-mapped to processors, with tables and graphic presentation of metric data.

The Profile Summary provides annotated details regarding the overall application performance. The application groups all data generated during its execution into categories. The Profile Summary lets you examine processor/DMA memory stalls, deadlock, interference, critical paths, and maximum contention. This analysis is useful for system-level performance tuning and debug.

System performance presents in terms of latency (number of cycles taken to execute the system) and throughput (data/time taken). You can address sub-optimal system performance by examining and controlling (through constraints):

  • Mapping and buffer packing
  • Stream and packet switch allocation
  • Interaction with neighboring processors
  • External interfaces

The following figure shows an example of the raw Profile Summary.

Figure 1. Profile Summary

You can use tables to view profile information specific to each kernel. Profile information is shown as a chart with a table showing what is running on the tiles. The following is an example chart.

Figure 2. Example Chart

This view shows a chart with a Total Function Time that is the total cycles the function used in running the graph. The y-axis shows the id of the function that can be referenced in the ID column of the table below. This information can be useful in determining where time is being spent in a function and helps with potential optimization or debug. The table lists the following items:

  • ID of the function profiled
  • Function name
  • Number of times the function was executed
  • Total time taken in cycles to execute the function
  • Total function execution time as a percent of the total execution time of the graph
  • Total time taken in cycles to execute the function and the functions(descendents) called from within it
  • Total time as a percent to execute the function and the functions(descendents) called from within it

Trace Report

Issues such as missing or mismatching locks, buffer overruns, and incorrect programming of DMA buffers are difficult to debug using traditional interactive debug techniques. Event trace provides a systematic way of collecting system level traces for the program events. Event trace provides direct support for generation, collection, and streaming of hardware events as a trace. The following figure shows the Trace report open in the Vitis IDE. The generated trace view is the full trace view by default.

The Vitis Analysis view provides the option to set the time window by specifying the start-time and end-time in the trace timeline. For details on how to set the time window, see Open Trace Summary using Time Window in the Vitis Reference Guide (UG1702).

Figure 3. Trace Report
Note: This example illustrates kernel function and functions that the compiler adds:
_main
Core main function. This is different from the function used in the top-level file.
_main_init
Kernel init function that runs one time per graph execution.
_cxa_finalize
Calls destructors of global C++ objects.
_fini
This section holds executable instructions that terminate the process. When a program exits normally, the system runs the code in this section.
Note: You can do an online analysis of the VCD when running the AI Engine simulator. Analysis is useful if the VCD file is too large or takes too long for the Vitis IDE to analyze the VCD and open the Trace view. The IDE then opens the existing WDB and event files instead of analyzing the VCD file. The command for AI Engine simulator is as follows.
aiesimulator --pkg-dir=./Work --online -wdb -text

Features of the trace report include the following:

  • Reports each tile. Within each tile the report includes core, DMA, locks, and I/O if there are PL blocks in the graph.
  • There is a separate timeline for each kernel mapped to a core. It shows when the kernel is executing (blue) or stalled (red) due to memory conflicts or waiting for stream data.
  • Use lock IDs in the core, DMA, and locks sections to identify how cores and DMAs interact with one another by acquiring and releasing locks.
  • The lock section shows the activities of the locks in the tile, both the allocation and release for read and write lock requests. Nearby tiles can allocate a particular lock. Thus, this section does not necessarily match the core lock requests of the core shown in the left pane of the image.
  • A red bar extends through the end of simulation time if a lock is not released.
  • Clicking the left or right arrows takes you to the start and end of a state, respectively.
  • The data view shows the data flowing through stream switch network with slave entry points and master exit points at each hop. This view is useful for finding the routing delays and network congestion effects with packet switching. A packet can get delayed behind another when both share the same stream channel.