To enable event trace data during the execution of your application, you must instrument your application for this task. You must enable additional logic, consume additional device resources to track the host and kernel execution steps, and capture event data. This process requires optionally modifying your host application to capture custom data, modifying your kernel XO during compilation and the xclbin during linking to capture different types of profile data from the device side activity. You also need to configure the Xilinx Runtime (XRT) as described in the xrt.ini File to capture data during the application runtime.
There are many different types of profiling for your applications, depending on which elements your system includes and what type of data you want to capture. The following table shows some of the levels of profiling that can be enabled, and discusses which are complimentary and which are not.
| Profile/Trace | Description | Comments |
|---|---|---|
| Host Application XRT Native API | Specified by the use of the native_xrt_trace option in the xrt.ini file. |
Generates profile summary and trace events for the XRT API as described in Writing the Software Application in the Data Center Acceleration using Vitis (UG1700). |
| Host Application User-Event Profiling | Requires additional code in the host application as described in Custom Profiling of the Host Application. | Generates user range data and user events for the host application.
Tip: Can be used
to capture event data for user-managed kernels as described in
Working with
User-Managed Kernels in the Data Center Acceleration using
Vitis (UG1700).
|
| Device Side Profiling | Enabled by the use of --profile options during v++ compilation and linking, as described in
--profile
Options
,
and the use of device_trace in the
xrt.ini file. |
Enables capturing data traffic between the host and kernel, kernel stalls, the execution times of kernels and compute units (CUs), in addition to monitoring activity in AMD Versalâ„¢ AI Engines. |
| AI Engine Graph and Kernels | Specified by the use of the aie_profile option in the xrt.ini file. These options can be
specified together or separately. |
Generates the default.aierun_summary report containing the Profile. The aierun_summary can be found in the aiesimulator_output folder of the AI Engine graph build directory. Refer to the AI Engine Simulation-Based Profiling chapter in the AI Engine Tools and Flows User Guide (UG1076) for more information. |
| Power Profile | Specified by the use of the power_profile option in the xrt.ini file. |
Generates the power_profile_<device>.csv report. Note: This feature is
not supported on embedded platforms or AWS.
|
| Vitis AI Profiling | Specified by the use of the vitis_ai_profile option in the xrt.ini file. |
Enables counter profiling of DPUs to generate the xrt.run_summary for viewing in Vitis analyzer. |
The device binary (xclbin)
file is configured for capturing limited device-side profiling data by default.
However, using the --profile option during the
Vitis compiler linking process instruments
the device binary by adding AXI Performance Monitors, and Memory Monitors to the
system. This option has multiple instrumentation options: --profile.data, --profile.stall,
and --profile.exec, as described in the
--profile
Options
.
--profile.data to the v++ linking
command line:
v++ -g -l --profile.data all:all:all ...
v++ -g option when compiling your kernel code for
debugging with software or hardware emulation.v++ compile and link process, data gathering during
application runtime must also be enabled in XRT by editing the xrt.ini file as discussed above. For example, the
following xrt.ini file enables power profiling,
event and stall trace capture when the application is
run:[Debug]
power_profile=true
device_trace=fine
stall_trace=all
To enable the profiling of Kernel Internals data, you must also add
the debug_mode tag in the [Emulation] section of the xrt.ini:
[Emulation]
debug_mode=batch
If you are collecting a large amount of trace data, you can
increase the amount of available memory for capturing data by specifying the --profile.trace_memory option during v++ linking, and add the trace_buffer_size keyword in the xrt.ini.
-
--profile.trace_memory - Indicates what type of memory to use for capturing trace data.
-
trace_buffer_size - Specifies the amount of memory to use for capturing the trace data during the application runtime.
--profile.trace_memory is not specified but device_trace is enabled in the xrt.ini file, the profile data is captured to the default platform
memory with 1 MB allocated for the trace buffer size.