AI Engine Event Trace in Hardware - 2025.1 English - UG1701

Embedded Design Development Using Vitis User Guide (UG1701)

Document ID
UG1701
Release Date
2025-07-16
Version
2025.1 English

To obtain trace data during hardware run, there must be routes dedicated to driving trace data from the AI Engine array to the PL or to the DDR. For this reason, during the graph compilation phase, you need to specify the trace data during hardware run and the interface to be used. See the following code for details.

v++ --c --mode aie --verbose --pl-freq=100 --workdir=./myWork \
--event-trace-port=gmio --event-trace=runtime \
--num-trace-streams=8 --xlopt=0 --include="./" \ 
--include="./src" --include="./src/kernels" --include="./data" \
./src/graph.cpp
  • For --event-trace=runtime: the only possibility here is runtime, indicating that signal selection will be decided at runtime.
  • For --event-trace-port=plio/gmio: selects GMIO and the NoC pathway instead of PLIO/PL pathway. PLIO uses PL logic, which can induce timing closure difficulties.
  • For --num-trace-streams=8: up to 16 streams can be used within the AIE Engine array to drive the trace events to the GMIO/PLIO.

For the profiling flow, you can perform event trace using either XSDB or XRT flow.

The metrics for the array are described below.

Table 1. AI Engine Metrics
Metric Name Description
functions Basic time line of function activity: events generated when kernel functions are being invoked and returned
partial_stalls Three types of core stalls are being registered: stream stalls (no data at input or back-pressure at output), cascade stalls and lock stalls.
all_stalls Same as partial_stalls with memory_stalls (memory conflict) added.
all_dma Data transfers of all 4 Memory DMA channels (2xS2MM, 2xMM2S)
all_stalls_dma Core stalls and data transfers of all 4 DMA channels. All core stalls are grouped, no differentiation on the type of stall.
all_stalls_s2mm Core stalls and data transfer of two S2MM channels 1
all_stalls_mm2s Core stalls and data transfer of two MM2S channels 1
s2mm_channels Data transfers and stalls of two S2MM channels
mm2s_channels Data transfers and stalls of two MM2S channels
s2mm_channels_stall Details of one S2MM channel. 2 In AI Engine-ML v2 based devices only
mm2s_channels_stall Details of one MM2S channel 2 . In AI Engine-ML v2 based devices only
  1. In AI Engine based devices, the stall events are concatenated into a group stall event.
  2. Includes Buffer Descriptors, tasks, starvation, back-pressure and lock stalls.
Table 2. Interface Tiles
Metric Name Description
input_ports Data transfers of 4 stream input from the AI Engine Array
input_port_stalls Data transfers and stalls of 2 inputs from the AI Engine Array
input_port_details Details on one MM2S channel 1. For GMIOs only
output_port Data transfers of 4 stream output to the AI Engine Array
output_port_stalls Data transfers and stalls of 2 outputs to the AI Engine Array
output_port_details Details on one S2MM channel. Includes Buffer Descriptors, tasks, starvation, back-pressure and lock stalls. For GMIOs only
input_output_ports Data transfers of 4 inputs or outputs of AI Engine Array
input_output_ports_stalls Data transfers and stalls of 2 inputs or output of the AI Engine Array
Table 3. Memory Tiles (AI Engine-ML and AI Engine-ML v2)
Metric Name Description
s2mm_channels Buffer Descriptor and Task events for two S2MM channels
s2mm_channels_stalls Details on one S2MM channels, adding lock stalls, back-pressure and stream starvation.
mm2s_channels Buffer Descriptor and Task events for 2 MM2S channels
mm2s_channels_stalls Details on one MM2S channel, adding lock stalls, back-pressure and stream starvation.
memory_conflicts1 Memory conflict for data memory banks 0-7
memory_conflicts2 Memory conflicts for data memory bank 8-15

XSDB Flow

When running the application, the trace data is stored in DDR memory by the debugging and profiling IP. To capture and evaluate this data, you must connect to the hardware device using xsdb. This command is typically used to program the device and debug bare-metal applications. Connect your system to the hardware platform or device over JTAG, launch the xsdb command in a command shell, and run the following sequence of commands:
xsdb% connect
xsdb% ta
xsdb% ta 1
xsdb% source $::env(XILINX_VITIS)/scripts/vitis/util/aie_trace.tcl​
xsdb% aietrace start -graphs mygraph -work-dir ./Work -link-summary $PROJECT/xsa.link_summary -base-address 0x900000000 -depth 0x800000 -tile-based-aie-tile-metrics "all:functions; {4,1}:{6,2}:all_stalls" 

# Execute the PS host application (.elf) on Linux
## After the application completes processing.
xsdb% aietrace stop

where, the source $::env(XILINX_VITIS)/scripts/vitis/util/aie_trace.tcl command sources the Tcl trace command to set up the xsdb environment.

After the hardware events have been captured on the SDCard, you offload them on your computer and you launch the Vitis Unified IDE to import and analyze data:
vitis -a aie_trace_profile.run_summary
For more details on this flow, see the chapters on Event Tracing in Hardware and XSDB flow in the AI Engine Tools and Flows User Guide (UG1076).

XRT Flow

Within the XRT flow, the selection of trace events is performed in the xrt.ini file in the SDCard. An example of such an xrt.ini file is shown hereafter:
# Main switch to turn on aie trace
[Debug]
aie_trace = true
# Continuous trace knobs
[AIE_trace_settings]
reuse_buffer = true
periodic_offload = true
# Time to wait between trace reads
buffer_offload_interval_us = 100
# Total amount of device memory shared between trace streams
buffer_size = 16M
# granularity
graph_based_aie_tile_metrics = all:all:functions

For more details, see the chapters on Event Tracing in Hardware and XRT Flow in the AI Engine Tools and Flows User Guide (UG1076).