The Vitis tool supports recording continuous trace data while the application is running. The application can run for a very long time thus leading to the capture of significant trace data, which can result in issues like incomplete trace data especially when the memory resource used for trace data is not large enough. Using continuous trace, analysis of the trace can be carried out while the application is still running or if the application has crashed before completion.
With the ability to continuously capture trace data, the Timeline Trace reports can be dynamically updated in the Vitis analyzer tool while your application is running. Once these reports are loaded in Vitis Analyzer, there is a hyperlink available indicating that the current report is being modified on the disk. If new data needs to be loaded, Reload or Auto-Reload options are available on the banner to let you view the updated report as your application runs and trace data is generated.
Continuous trace is not enabled by default. Additionally, the memory resources of an FPGA are not unlimited. So if the application generates large trace data, a circular buffer for storing the data can be used. The circular buffer can be written, offloaded to the host, and reused again. By enabling a circular buffer with continuous trace, the memory resources needed are even smaller thus saving available resources on the device. However, an application run with continuous trace/circular buffer may result in multiple device trace files.
Here are some scenarios where it is recommended to use the memory resource as a circular buffer.
The circular buffer implementation is automatically turned on when continuous trace is enabled in the xrt.ini. The flow requires the following settings for enabling continuous trace.
- In the xrt.ini file,
continuous_trace
is set to TRUE - v++ linking option
--profile.trace_memory
is set to DDR or HBM
You can optionally set:
- The size of the trace buffer using
trace_buffer_size
in the xrt.ini file. This defaults to 1 MB. - The interval at which the trace buffer is offloaded from the
device using
trace_buffer_offload_interval_ms
in the xrt.ini file. The default is 10 ms. - The interval at which files are dumped by setting
trace_file_dump_interval_s
. The default is 3 seconds.
trace_buffer_offload_interval_ms
to 0 ms.continuous_trace
with trace_buffer_size
as 8k and default trace_buffer_offload_interval_ms
of
10 ms, the trace data rate is 819200 bytes/s which is less than the default of 100 MB/s.
In this scenario, the circular buffer is NOT
enabled by default and an XRT warning is reported:
[XRT] WARNING: Unable to use circular buffer for continuous trace offload. Please increase trace buffer size and/or reduce continuous
trace interval. Minimum required offload rate (bytes per second) : 104857600 Requested offload rate : 819200
[Debug]
opencl_trace=true
device_trace=fine
stall_trace=all
continuous_trace=true
// The following are optional and needed only in rare circumstances
trace_buffer_size=20M
trace_buffer_offload_interval_ms=10
trace_file_dump_interval_s=2
The following are the results of these settings:
-
opencl_trace
: Enables the generation of host-related OpenCL API trace,opencl_trace.csv
files is created. -
device_trace
: Enables the collection of kernel activity to be added to profile summary and trace,device_trace_0.csv
files are created with 0 being the device number. -
stall_trace
: Enables the hardware generation of stalls into compute units. -
continuous_trace
: Enables the continuous dumping of files for trace and the continuous reading of device data into the host. -
trace_buffer_size
: Specifies the amount of memory to consume for trace data capture. -
trace_buffer_offload_interval_ms
: Controls the reading of device data from the device to the host in milliseconds. -
trace_file_dump_interval_s
: Controls the time between dumping of trace files in seconds.
As a result, there are several CSV files generated in addition to the xrt.run_summary
as part of the application run using the
above xrt.ini file. Vitis Analyzer only needs the generated run_summary
file and will use the relevant CSV files to display the
profile summary and timeline trace.
Here are the recommendations on setting up an application for trace data dumping:
- By default the memory used for trace capture is the first memory resource
on the platform, which can be determined using the platforminfo Utility. In most platforms this is either DDR or HBM. The
amount of memory reserved for trace data is determined by the
trace_buffer_size
switch in the xrt.ini file, which defaults to 1 MB. .Note: You can also specify the use of FIFO and the size to allocate using the--profile.trace_memory
option. - If still unable to dump maximum trace, disable stall trace by setting
stall_trace=off
orstall_trace=on
withdata_transfer_trace=coarse
. - If the application requires larger size of trace buffer, enable circular buffer by
setting
continuous_trace=true
with default settings oftrace_buffer_offload_interval_ms=10
andtrace_file_dump_interval_s=5
. Ideally, a continuous trace feature should be used for the following cases:- Long-running design with minimal trace generated
- Debugging application crashes where some .csv files might still be available for debugging
- If the application run is still unable to dump the maximum trace, the
trace_buffer_size
can further be increased. - If the application still creates huge trace data that the host cannot keep up, use
the smaller size of
trace_file_dump_interval
, which creates multiple files equivalent to the interval provided. - Lastly, continuous trace can generate several trace files as part of the
application run in addition to
xrt.run_summary
file. The Vitis Analyzer only needs the generatedrun_summary
file and can pick the relevant CSV files generated to display profile summary and timeline trace to provide a better experience.