If you are a command line user, the xrt.ini file needs to be created manually and saved to the same directory as the host executable.
The runtime library checks if xrt.ini exists in the same directory as the host executable and automatically
reads the file to configure the runtime. You can also specify the location of an xrt.ini file at runtime by setting the XRT_INI_PATH
environment variable to point to the file, for
example:
export XRT_INI_PATH=/path/to/xrt.ini
Runtime Initialization File Format
The xrt.ini file is a simple text file with groups of keys and their values. Any line beginning with a semicolon (;) or a hash (#) is a comment. The group names, keys, and key values are all case sensitive.
The following is an example xrt.ini file that enables the timeline trace feature, and directs the runtime log messages to the Console view.
#Start of Debug group
[Debug]
native_xrt_trace = true
device_trace = fine
#Start of Runtime group
[Runtime]
runtime_log = console
There are three groups of initialization keys:
- Runtime
- Debug
- AIE_profile_settings
- AIE_trace_settings
- Emulation
The following tables list all supported keys for each group, the supported values for each key, and a short description of the purpose of the key.
Runtime Group
The Runtime group of switches lets you configure elements of the runtime operation as described below.
Key | Valid Values | Description |
---|---|---|
api_checks
|
[true|false]
|
Enables or disables OpenCL
API checks.
|
cpu_affinity
|
{N,N,...}
|
Pins all runtime threads to specified CPUs.
Example:
|
exclusive_cu_context
|
[true|false]
|
This allows the host application to direct OpenCL to acquire exclusive CU access, so that low-level AXI read/write (xclRegRead and xclRegWrite) can be used for regular kernels. |
runtime_log
|
[null | console | syslog |
<filename>]
|
Specifies where the runtime logs are printed
|
verbosity
|
[0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
]
|
Verbosity of the log messages. The default value is 4. |
Debug Group
The Debug group of switches define key options for the enabling profiling of the application during runtime, or tracing data transfers and execution. These switches apply to both AI Engine and PL kernels in the Vitis acceleration flow, and let you configure aspects of the runtime to control the frequency of data capture, the events to capture, and the amount of memory to reserve or use for recording trace and profile data.
Key | Valid Values | Description |
---|---|---|
aie_profile
|
[true|false]
|
Enables the runtime configuration and polling of AI Engine hardware performance counters. Available on
VCK190 hardware and hardware emulation runs.
|
aie_trace
|
[true|false]
|
Enables the runtime configuration and collection of AI Engine event trace. Available on VCK190 hardware
runs only.
|
aie_status
|
[true|false]
|
Enables the polling of AI Engine status information. Available on VCK190 hardware and hardware emulation runs. |
aie_status_interval_us
|
integer (default=1000us) | Controls the interval at which AI Engine status information is captured. Specified in microseconds. |
app_debug
|
[true|false]
|
Enables the OpenCL
application debug for the host code when debugging with GDB.
|
continuous_trace
|
[true|false]
|
Enables the continuous dumping of files for trace and the
continuous reading of device data into the host.
Note: This switch only has an effect if
device_trace is enabled. |
device_counters
|
[true|false]
|
Enables device counter offload only, without enabling trace functionality. |
device_trace
|
[off|fine|coarse|accel]
|
Enables the collection of data from monitors inserted on the
PL to add to summary and trace.
|
host_trace
|
[true|false]
|
Enables trace of host code based on the first protocol
encountered. Tip: If your host
application uses both OpenCL and XRT native
API you should manually specify both
opencl_trace and native_xrt_trace
to capture all events. |
lop_trace
|
[true|false]
|
Enables generation of lower overhead OpenCL API host trace. Should not be used with other
OpenCL options.
|
native_xrt_trace
|
[true|false]
|
Enables generation of the Native C/C++ API trace. This also
generates the tables for "Host Data Transfer from/to Global memory" in the Profile
Summary.
|
opencl_trace
|
[true|false]
|
Enables generation of OpenCL API host trace.
|
pl_deadlock_detection
|
[true|false]
|
Enables deadlock detection for PL kernels. |
power_profile
|
[true|false]
|
Enables the polling of power data during the execution of the
application.
Note: This feature is not
supported on embedded platforms or AWS.
|
power_profile_interval_ms
|
<int>(default=20)
|
Controls the interval of reading the power counters in
milliseconds. The default interval is 20 ms. Note: This switch only has an effect if
power_profile = true . |
profile_api
|
[true|false]
|
Enables access to HAL API directly from the host application
to read counters on device profiling monitors during execution.
|
stall_trace
|
[off|all|dataflow|memory|pipe]
|
Specifies the type of device-side stalls to capture and
report in the timeline trace. The default is off.
Note: This switch only has an effect if
device_trace is enabled. |
trace_buffer_offload_interval_ms
|
<int>
|
Controls the reading of device data from the device to the
host in milliseconds (ms). The default is 10 ms. Note: This switch only has an effect if
device_trace is enabled. |
trace_buffer_size
|
<string>
|
If the .xclbin was
created with memory offload of trace specified, as described in --profile Options,this switch determines the size of
the buffer to allocate in memory to capture trace data. The default is 1M. Note: This switch only has an effect if
device_trace is enabled. |
trace_file_dump_interval_s
|
<int>
|
Controls the time between dumping of trace files in seconds
(s). The default is 5s. Note: This switch
only has an effect if
device_trace is
enabled. |
vitis_ai_profile
|
[true|false]
|
Profile summary and other files come from Vitis AI application layer.
|
xocl_debug
|
[true|false]
|
Generates the xocl.log file when enabled. When any trace options are also enabled, the debug log is added to the xrt.run_summary to view in Vitis Analyzer. |
xrt_trace
|
[true|false]
|
Enables generation of low-level HW shim function trace during
HW runs. This will be disabled when used with native_xrt_trace .
|
AIE_profile_settings Group
The options specified in this group are applied only if
aie_profile=true
under the [Debug]
group.
Key | Valid Values | Description |
---|---|---|
graph_based_aie_metrics
|
<graph name|all>:<kernel name|all>:<off|heat_map|stalls|execution|floating_point|write_bandwidths|read_bandwidths|aie_trace> |
Specify the metric sets reported by the AI Engine module of AI Engine tiles on a graph-by-graph basis. Important: Currently, only
all is
supported for kernel specification.Controls the configuration of the statistics read from the AIE core performance counters for the entire AI Engine graph application. heat_map: profile active/stall cycles and vector instruction usage stalls: profile the different types of stalls (i.e., memory, stream, lock, and cascade) execution: profile the AI Engine instructions floating_point: profile floating point exceptions write_bandwidths: profile the write bandwidth of streams and cascades read_bandwidths: profile the read bandwidths of streams and cascades aie_trace: profile amount and stalls of event trace from core and memory modules |
graph_based_aie_memory_metrics
|
<graph name|all>:<kernel name|all>:<off|conflicts|dma_locks|dma_stalls_s2mm|dma_stalls_mm2s|write_bandwidths|read_bandwidths> |
Specify the metric sets reported by the memory module of AI Engine tiles on a graph-by-graph basis. Important: Currently, only
all is
supported for kernel specification.Controls the configuration of statistics read from the AI Engine memory performance counters for the entire AI Engine graph application. conflicts: profile the DMA memory conflicts dma_locks: profile DMA locks and stalls on lock acquire dma_stalls_s2mm: profile stalls on DMA S2MM channels dma_stalls_mm2s: profile stalls on DMA MM2S channels write_bandwidths: profile bandwidths of DMA S2MM channels read_bandwidths: profile bandwidths of DMA MM2S channels |
tile_based_aie_metrics
|
<{<column>,<row>}|all>:<off|heat_map|stalls|execution|floating_point|write_bandwidths|read_bandwidths|aie_trace> ; {<mincolumn,<minrow>}:{<maxcolumn>,<maxrow>}:<off|heat_map|stalls|execution|floating_point|write_bandwidths|read_bandwidths|aie_trace> |
Specify the metric sets reported by the AI Engine module of AI Engine tiles on a tile-by-tile basis. This can be used in conjunction with graph-by-graph selection and will take priority on the specified tiles. Refer to descriptions from |
tile_based_aie_memory_metrics
|
<{<column>,<row>}|all>:<off|conflicts|dma_locks|dma_stalls_s2mm|dma_stalls_mm2s|write_bandwidths|read_bandwidths> ; {<mincolumn,<minrow>}:{<maxcolumn>,<maxrow>}:<off|conflicts|dma_locks|dma_stalls_s2mm|dma_stalls_mm2s|write_bandwidths|read_bandwidths> |
Specify the metric sets reported by the memory module of AI Engine tiles on a tile-by-tile basis. This can be used in conjunction with graph-by-graph selection and will take priority on the specified tiles. Refer to descriptions from |
tile_based_interface_tile_metrics
|
<column|all>:<off|input_bandwidths|output_bandwidths|packets>[:<channel>] ; <mincolumn>:<maxcolumn>:<off|input_bandwidths|output_bandwidths|packets>[:<channel>] |
Specify the metric sets reported by the AI Engine interface tiles on a tile-by-tile basis. Note: Interface tiles are separate from the
AI Engine tiles and have different metric sets.
|
interval_us
|
<int>
|
Controls the interval of reading the AI Engine counter values in microseconds (µs). The
default interval is 1000 µs. Note: This
switch only has an effect if
aie_profile =
true . |
AIE_trace_settings Group
The options specified in this group are applied only if
aie_trace=true
under the [Debug]
group.
Key | Valid Values | Description |
---|---|---|
buffer_size
|
<string>
(default=8M)
|
Controls the total size of the buffers allocated for AI Engine event trace. This size is partitioned evenly
into the number of different trace streams coming out of the AI Engine. The default is 8M. Note: This switch only has an effect if
aie_trace = true . |
buffer_offload_interval_us
|
integer (default=10ms) | Interval, in milliseconds, between reading of PLIO mode AI Engine trace from device to Host memory. |
periodic_offload
|
true/false (default=true) | Enables continuous offload of PLIO mode AI Engine trace. Generated AI Engine trace output files (one per stream) gets appended with new trace data. |
file_dump_interval_s
|
integer (default=5s) | Interval, in seconds, between writing (appending) of raw AI Engine trace data to output files. |
graph_based_aie_tile_metrics
|
string("") <graph name|all>:<kernel name|all>:<off|functions|functions_partial_stalls|functions_all_stalls> |
Specify the metric sets reported by the AI Engine module of AI Engine tiles on a graph-by-graph basis. Important: Currently, only
all is supported for kernel
specification. |
tile_based_aie_tile_metrics
|
string("") <{<column>,<row>}|all>:<off|functions|functions_partial_stalls|functions_all_stalls>[:<memory_stalls|stream_stalls|cascasde_stalls|lock_stalls>] {<mincolumn,<minrow>}:{<maxcolumn>,<maxrow>}:<off|functions|functions_partial_stalls|functions_all_stalls> |
Specify the metric sets reported by the AI Engine module of AI Engine tiles on a tile-by-tile basis. Important: Currently, only
all is supported for kernel
specification. |
reuse_buffer
|
true/false (false) |
Emulation Group
The Emulation group of switches apply to the emulation environments and the AMD Vivado™ simulator.
Key | Valid Values | Description |
---|---|---|
aliveness_message_interval
|
Any integer | Specifies the interval in seconds that aliveness messages need to be printed. The default is 300. |
debug_mode
|
[off|batch|gui]
|
Specifies how the waveform is saved and displayed during
emulation.
Note: The kernel needs to be compiled with debug enabled (
v++ -g ) for the waveform to be saved and displayed in
the simulator GUI. |
kernel-dbg
|
[true|false]
|
Enables kernel debug functionality during software emulation
as described in Command Line Debug Flow.
|
print_infos_in_console
|
[true|false]
|
Controls the printing of emulation info messages to user's
console. Emulation info messages are always logged into a file called emulation_debug.log
|
print_warnings_in_console
|
[true|false]
|
Controls the printing emulation warning messages to user's
console. Emulation warning messages are always logged into a file called emulation_debug.log.
|
print_errors_in_console
|
[true|false]
|
Controls printing emulation error messages in user's console.
Emulation error messages are always logged into the emulation_debug.log file.
|
user_pre_sim_script
|
Path to Tcl file | For the first run, run simulation in GUI mode. Add signals
that you want to add. Copy the commands from the Tcl console and save into a Tcl
script. For the next run, pass the Tcl script in batch mode. |
user_post_sim_script
|
Path to Tcl file | Any post operations can be specified in the Tcl and pass to the switch. All the command provided in the Tcl gets executed after simulation is completed. |
xtlm_aximm_log
|
[true|false]
|
Enables the XTLM AXI4 Memory Map transaction logging at runtime and you could see all the transactions in the xsc_report.log file. |
xtlm_axis_log
|
[true|false]
|
Enables the XTLM AXI4-Stream transaction logging at runtime and you could see all the transactions in the xsc_report.log file. |
timeout_scale
|
na/ms/sec/min
|
Timeout support for clPollStream API in emulation. Provides a scale for the timeout
specified in clPollStream API. The timeout
specified in the code is specified in ms, and might not work for emulation.
Therefore use the timeout_scale to map ms to
another scale if needed for emulation. Important: Timeout is not enabled in emulation by default.
Use this option to enable
clPollStream
timeout. |