Viewing Profile Results in the Vitis IDE - 2024.1 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2024-06-27
Version
2024.1 English

To view the profile results obtained using the XRT flow, use the following command:

vitis -a xrt.run_summary

To launch the Vitis IDE to view the profiling information in the XSDB flow, use the following command.

vitis -a aie_trace_profile.run_summary

Performance Annotation in Graph View

Depending on what is profiled in the hardware, the performance data can be contained in specific tables in the graph view. By hovering the mouse over the corresponding object, the performance is displayed.

Following example shows how output throughput is measured in the hardware and where is it shown in the graph view.

  1. Add the content to xrt.ini:
    [Debug]
    aie_profile=true
    [AIE_profile_settings]
    tile_based_interface_tile_metrics=25:25:output_throughputs
  2. And then run application to get profiling result, and view it in the Vitis IDE.

The performance data is contained in Interface Channels table. By hovering the mouse on the profiled port, the performance data is displayed, as shown in the following figure.

Figure 1. Performance Annotation on Output Throughput

Example of heat_map Core Metrics and conflicts Memory Metrics

The following image shows the design's active time, stall time, cumulative instruction count, and vector instruction count as part of heat_map metric and memory conflict time, as well as cumulative memory error time of conflicts metrics for ten tiles of an example design.

Figure 2. Example of heat_map and conflicts Metrics

Consider the AI Engine located in (15,0). During the active utilization time (5.120 ms) it performs 5120000 vector instructions which represents 87% of the active time. This is an excellent performance that indicates a well optimized core.

Example of stalls Core Metrics and dma_locks Memory Metrics

The following image shows the design's memory stall time, stream stall time, cascade stall time, and lock stall time as part of stalls metrics and cumulative DMA activity time, as well as cumulative DMA locks count of dma_locks metrics for ten tiles of an example design.

Figure 3. Example of stalls and dma_locks Metrics

On the core (24,2), the DMA has been active for 70.645 ms (77.8 millions instructions), but has been stalled 298 times.

Example of execution Core Metrics and conflicts Memory Metrics

The following image shows the design's cumulative instruction count, vector instruction count, load instruction count, and store instruction count as part of execution metrics and memory conflict time, as well as cumulative memory error time of conflicts metrics for ten tiles of an example design.

Figure 4. Example of execution and conflicts Metrics