Event Profile APIs for Graph Inputs and Outputs - 2025.2 English - UG1076

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2025-11-20
Version
2025.2 English
You can collect profile statistics of your design by calling event APIs in your PS host code. Event APIs are available both during simulation and when you run the design in hardware.
Note: This section introduces event APIs that are incompatible with profiling methods in the previous section. The event APIs are inserted in the host code. You are responsible for controlling when and how the event starts and stops.

The AI Engine has hardware performance counters and can be configured to count hardware events for measuring performance metrics. You can use the event API together with the graph control API to profile certain performance metrics during a controlled period of graph execution. The event API supports only platform I/O ports (PLIO and GMIO) to measure performance metrics such as the following:

  • Platform I/O port bandwidth
  • Graph throughput
  • Graph latency

The event API tracks events that occurred on the stream switch of the nets that cross the AI Engine - PL interfaces. The events on the stream switch of the nets include idle, running and stall, as shown in following figure.

Figure 1. Events on Nets
  • When there is no data passing the stream switch, the stream switch is in an idle state.
  • When there is data passing the stream switch, the stream switch is in a running state.
  • When all the FIFOs on the net are full, the stream switch is in a stall state.
  • When the data transfer resumes, the stream switch returns to a running state.

The following graph shows an example of data being sent from the mm2s PL kernel to the AI Engine. The graph also shows sending data from the AI Engine to the s2mm PL kernel.

Figure 2. Example Graph

Different ports can go through the same AI Engine - PL interface column, which shares the performance counters in the interface. You can check the Array view in Vitis IDE to see which columns those ports are routing through. The following figure shows an array view of the above example.

Note: Stream switches in the red circle indicate where the event API is monitoring.
Figure 3. Example Array View
Note:
  • The input buffer buf0 in the AI Engine is ready to accept data from the mm2s PL kernel after the graph is initialized. As soon as the mm2s PL kernel starts, it sequentially fills up the PING-PONG buffers and FIFOs inside the stream switch connected to the buf0. The data transported to and from these buffers do not depend on graph.run().
  • Each column of the AI Engine - PL interface has two performance counters. Because there are limited performance counters, event::stop_profiling() can be used to release the performance counters.
  • There is some overhead when calling the graph and profiling APIs. You can read the profiling results using event::read_profiling(). The results can vary if the performance counters do not stop before event::read_profiling().

In hardware emulation (hw_emu), XRT Event Profile APIs are functionally supported and useful for evaluating performance trends between design iterations. However, because hw_emu runs on QEMU (untimed) with abstracted memory models, measurements are not cycle-accurate. Use hw_emu results to compare relative performance (for example, Run A vs. Run B) and identify improvements, but always validate absolute performance on hardware. Timing accuracy is particularly lower for GMIO and external memory paths.

Important: You can also use the ADF APIs in AI Engine simulation, hardware emulation, and hardware flows. In hardware emulation and hardware flows, adf::registerXRT is required before using the event profile ADF APIs. For example:
#include "adf/adf_api/XRTConfig.h"
......
auto device = xrt::device(0); //device index=0
auto uuid = device.load_xclbin(xclbinFilename);
auto dhdl = xrtDeviceOpenFromXcl(device);
adf::registerXRT(dhdl, uuid.get());
event::handle handle = event::start_profiling(......);