Analysis of ADF Event API Profiling Flow vs. XRT-based Profiling Flow - 2025.2 English - UG1076

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2025-11-20
Version
2025.2 English

ADF Event API Profiling Flow

The ADF Event API flow requires modification of the host code as seen in the previous sections. You can instrument the host application at various points. Use it to collect specific metrics, compile the updated code, deploy the new executable to the board, and execute it on hardware.

This approach allows you to enable or disable measurement collection dynamically using input parameters. This provides flexibility to gather different metrics or run without measurement overhead as needed. When deployed, you simply execute the application with varying parameters to obtain the desired performance data.

XRT Profiling Flow

The xrt.ini configuration file manages the XRT flow and specifies the metrics to be collected. xrt.ini is a plain text file, so can be edited with any text editor. Using this method, you can enable or disable profiling and tracing features across different parts of the design. This allows you to control profiling and tracing for the AI Engine array, PL kernels, and the host processor. The Vitis Analyzer interface consolidates all results.

Latency and Throughput Computation

The ADF Event API flow depends on the host clock. The flow is subject to latency introduced by the operating system between reading performance counters and the actual read operation. Latency can lead to variability in results across runs, especially when cycle counts depend on host-side wait instructions, and the placement of API calls in the code significantly influences the measurements.

The XRT flow, on the other hand, focuses on the running and stalled states of the system. This flow provides consistent system-level metrics. It can include idle times in transfer periods and lacks the granular control of the ADF Event API.

Flexibility

The ADF Event API flow embeds all trace and profile parameters within the host code, allowing you to enable or disable tracing and profiling at runtime through parameters. The API also supports customizable output formats and automated result storage which facilitate performance tracking and decision-making.

The XRT flow offers complete independence from the application code. the flow can manage profiling and tracing for the PS, PL, and AI Engine from a single configuration file. The flow supports all features available in the ADF Event API flow plus compatibility with new XRT features such as hardware context for multi-partition systems.

Limitations

Profiling using event::start_to_bytes_transferred
In the Event API flow, if more bytes pass through the port than specified, the counters reset and generate incorrect results. Also the number of bytes must be a multiple of 4 (32 bits).
Profiling using interface_tile_latency
In the hardware emulation flow, when the design uses the XRT profiling flow, the latency results are only accurate if data paths remain within the AI Engine array. If data traverses the PL or DDR, results can be inaccurate. This is due to the lack of a cycle-accurate DDR model in the hardware emulation flow.

Recommended Use Cases

ADF Event API
Ideal for AI Engine kernel profiling and scenarios requiring custom output formats or immediate access to results after execution.
XRT Flow
Use this for full-system profiling and tracing (PS, PL, AI Engine), especially when you need centralized configuration and minimal code changes.

Both flows have distinct advantages. Choose the flow based on your preferences (such as comfort with C++ and integrated tracing, or preference for decoupled code and debugging). Also consider the specific profiling or tracing requirements of the project.

Table 1. Summary of Differences
Feature ADF Event API Flow XRT Flow
Code Modification Requires host code modification No code changes, configuration using xrt.ini file
Instrumentation and Control Dynamic control via input parameters Centralized control for PS, PL, and AI Engine
Integration Integrated into application logic Decoupled from application code
Measurement Precision Dependent on host clock and OS latency can vary across runs if the number of cycles involves wait instructions. Considers only Running and Stalled States. Excludes Idle time
Use Cases
  • Ability to obtain quick results without using Vitis Analyzer
  • Ability to generate results in custom output formats
System-wide profiling and tracing