Enabling Trace in Your Application - 2025.1 English - UG1701

Embedded Design Development Using Vitis User Guide (UG1701)

Document ID
UG1701
Release Date
2025-07-16
Version
2025.1 English

To enable event trace data during the execution of your application, you must instrument your application for this task. You must enable additional logic, and consume additional device resources to track the host and kernel execution steps, and capture event data. This process requires optionally modifying your host application to capture custom data, modifying your kernel XO during compilation and the xclbin during linking to capture different types of profile data from the device side activity, and configuring the Xilinx Runtime (XRT) as described in the xrt.ini File or using xsdb command line to capture data during the application runtime.

In these traditional hardware event trace, the trace information is stored in DDR memory available in the Versal device initially, and offloaded to SD card after the application run completes. This imposes limitations on the amount of trace information that can be stored and analyzed.

The high-speed debug port (HSDP) debug port provides debugging and trace capability for programmable logic (PL), processing system (PS), and AI Engines through a dedicated Aurora interface and a high-speed debug cable like SmartLynq+. The HSDP leverages the high-speed gigabit transceivers to make debug less intrusive to the system configuration. AI Engine trace offload via HSDP has more DDR memory in the SmartLynq+ module and supports analyzing large quantities of trace information for complex designs. In addition, the SmartLynq+ module offers high bandwidth connectivity to offload trace information via HSDP which is faster that standard JTAG connection. HSDP bandwidth is lower than direct DDR storage but allows much larger trace data set to be stored and analyzed. More details are available on Event Trace Offload using High Speed Debug Port in the AI Engine Tools and Flows User Guide (UG1076).

Tip: While capturing profile data is a critical part of the profiling and optimization process for building your application, it does consume additional resources and impacts performance. You should be sure to clean these elements out of your final production build.

There are many different types of profiling for your applications, depending on which elements your system includes, and what type of data you want to capture. The following table shows some of the levels of profiling that can be enabled, and discusses which are complimentary and which are not.

Table 1. Event Trace For Host Application, PL Kernels, and AI Engine Graphs
Trace Description Comments
Host Application XRT Native API Specified by the use of the native_xrt_trace option in the xrt.ini file. Generates profile summary and trace events for the XRT API as described in Writing the Software Application in the Data Center Acceleration using Vitis (UG1700).
Host Application User-Event Profiling Requires additional code in the host application as described in Custom Profiling of the Host Application. Generates user range data and user events for the host application.
Tip: Can be used to capture event data for user-managed kernels as described in Working with User-Managed Kernels in the Data Center Acceleration using Vitis (UG1700).
AI Engine Graph and Kernels Specified by the use of the aie_trace options in the xrt.ini file. Generates the default.aierun_summary report containing the Trace reports. The aierun_summary can be found in the aiesimulator_output folder of the AI Engine graph build directory. Refer to the AI Engine Simulation-Based Profiling chapter in the AI Engine Tools and Flows User Guide (UG1076) for more information.

The device binary (xclbin) file is configured for capturing limited device-side profiling data by default. However, using the --profile option during the Vitis compiler linking process instruments the device binary by adding AXI Performance Monitors, and Memory Monitors to the system. This option has multiple instrumentation options: --profile.data, --profile.stall, and --profile.exec, as described in the --profile Options .

As an example, add --profile.data to the v++ linking command line:
v++ -g -l --profile.data all:all:all ...
Tip: Be sure to also use the v++ -g option when compiling your kernel code for debugging with software or hardware emulation.

After your application is enabled for profiling during the v++ compile and link process, data gathering during application runtime must also be enabled in XRT by editing the xrt.ini file as discussed above.

To enable the profiling of Kernel Internals data, you must also add the debug_mode tag in the [Emulation] section of the xrt.ini:

[Emulation]
debug_mode=batch

If you are collecting a large amount of trace data, you can increase the amount of available memory for capturing data by specifying the --profile.trace_memory option during v++ linking, and add the trace_buffer_size keyword in the xrt.ini.

--profile.trace_memory
Indicates the type of memory to use for capturing trace data.
trace_buffer_size
Specifies the amount of memory to use for capturing the trace data during the application runtime.
Tip: When --profile.trace_memory is not specified but device_trace is enabled in the xrt.ini file, the profile data is captured to the default platform memory with 1 MB allocated for the trace buffer size.

Finally, as discussed in Continuous Trace Capture you can enable continuous trace capture to continuously offload device trace data while the application is running, so in the event of an application or system crash, some trace data is available to help debug the application.