ADF Event API Profiling Flow
The ADF Event API flow requires modification of the host code as seen in the previous sections. You can instrument the host application at various points. Use it to collect specific metrics, compile the updated code, deploy the new executable to the board, and execute it on hardware.
This approach allows you to enable or disable measurement collection dynamically using input parameters. This provides flexibility to gather different metrics or run without measurement overhead as needed. When deployed, you simply execute the application with varying parameters to obtain the desired performance data.
XRT Profiling Flow
The xrt.ini configuration file manages
the XRT flow and specifies the metrics to be collected. xrt.ini is a plain text file, so can be edited with any text editor.
Using this method, you can enable or disable profiling and tracing features across
different parts of the design. This allows you to control profiling and tracing for
the AI Engine array, PL
kernels, and the host processor. The Vitis Analyzer
interface consolidates all results.
Latency and Throughput Computation
The ADF Event API flow depends on the host clock. The flow is subject to
latency introduced by the operating system between reading performance counters and
the actual read operation. Latency can lead to variability in results across runs,
especially when cycle counts depend on host-side wait instructions, and the placement of API calls in the code
significantly influences the measurements.
The XRT flow, on the other hand, focuses on the running and stalled states of the
system. This flow provides consistent system-level metrics. It can include idle times in transfer periods and lacks the granular
control of the ADF Event API.
Flexibility
The ADF Event API flow embeds all trace and profile parameters within the host code, allowing you to enable or disable tracing and profiling at runtime through parameters. The API also supports customizable output formats and automated result storage which facilitate performance tracking and decision-making.
The XRT flow offers complete independence from the application code. the flow can manage profiling and tracing for the PS, PL, and AI Engine from a single configuration file. The flow supports all features available in the ADF Event API flow plus compatibility with new XRT features such as hardware context for multi-partition systems.
Limitations
- Profiling using
event::start_to_bytes_transferred - In the Event API flow, if more bytes pass through the port than specified, the counters reset and generate incorrect results. Also the number of bytes must be a multiple of 4 (32 bits).
- Profiling using
interface_tile_latency - In the hardware emulation flow, when the design uses the XRT profiling flow, the latency results are only accurate if data paths remain within the AI Engine array. If data traverses the PL or DDR, results can be inaccurate. This is due to the lack of a cycle-accurate DDR model in the hardware emulation flow.
Recommended Use Cases
- ADF Event API
- Ideal for AI Engine kernel profiling and scenarios requiring custom output formats or immediate access to results after execution.
- XRT Flow
- Use this for full-system profiling and tracing (PS, PL, AI Engine), especially when you need centralized configuration and minimal code changes.
Both flows have distinct advantages. Choose the flow based on your preferences (such as comfort with C++ and integrated tracing, or preference for decoupled code and debugging). Also consider the specific profiling or tracing requirements of the project.
| Feature | ADF Event API Flow | XRT Flow |
|---|---|---|
| Code Modification | Requires host code modification | No code changes, configuration using xrt.ini file |
| Instrumentation and Control | Dynamic control via input parameters | Centralized control for PS, PL, and AI Engine |
| Integration | Integrated into application logic | Decoupled from application code |
| Measurement Precision | Dependent on host clock and OS latency can vary across runs if the number of cycles involves wait instructions. | Considers only Running and Stalled States. Excludes Idle time |
| Use Cases |
|
System-wide profiling and tracing |