The XRT flow is as follows:
- Burn the generated sd_card.img to the physical SD card.
- Create the
xrt.ini
file in the sd_card folder as described in this section to enablexrt
flow.An example
xrt.ini
file is shown below.# Main switch to turn on aie trace [Debug] aie_trace = true # Continuous trace knobs [AIE_trace_settings] reuse_buffer = true periodic_offload = true # Time to wait between trace reads buffer_offload_interval_us = 100 # Total amount of device memory shared between trace streams buffer_size = 16M # granularity graph_based_aie_tile_metrics = all:all:functions
- Run the design on hardware to trace hardware events.
- Copy the captured trace data from the sd_card folder to your design at same level as
the design
Work
directory. The trace data is generated in the same location as the host application on the SD card. They arexrt.run_summary
,aie_event_runtime_config.json
, andaie_trace_N.txt
. - Use the Vitis IDE to import and
analyze data with this
command.
vitis -a xrt.run_summary
[Debug]
aie_trace = true
# Section for AIE trace settings
[AIE_trace_settings]
# Size of AIE trace buffer in DDR (Format: <Integer>[K|k|M|m|G|g]; Default: 1M)
buffer_size = 100M
# Graph/Kernel name
graph_based_aie_tile_metrics = <graph name|all>:<kernel_name|all:<off|functions|partial_stalls|all_stalls>
# AI Engine Tiles
# Single or all tiles
tile_based_aie_tile_metrics = <{<column>,<row>}|all>:<off|functions|partial_stalls|all_stalls>
# Range of tiles
tile_based_aie_tile_metrics = {<mincolumn,<minrow>}:{<maxcolumn>,<maxrow>}:<off|functions|partial_stalls|all_stalls>
Option | Description |
---|---|
aie_trace =
true
|
Enables AI Engine event trace during application execution. |
buffer_size =
100M
|
Sets the size of the event trace buffer in DDR memory. |
reuse_buffer =
true
|
Enables the reuse of trace buffer in DDR
memory. When this option is enabled, the DDR trace buffer is
treated as a circular buffer, and trace data is continuously
offloaded from XRT. This option only applies to event trace data
being captured using PLIO trace. This option cannot be used with
GMIO trace. The default is false . |
periodic_offload =
true / false
|
Enables offloading of trace data from DDR to
XRT at regular intervals while the application is running. If
this option is not set (false ), the trace data
is offloaded only at the end of the run. The default option
setting is true . This option
enables you to offload trace data at regular periodic intervals
even if there are application crashes. In such cases, you can
examine trace information up to the previous periodic download
of trace data. This option can be used with
the This option, when used in
combination with the
|
offload_interval_us
= 10
|
Specify the frequency, in milliseconds, of
how often the trace events data is read from the trace buffers
in DDR memory to the buffers in XRT. This option is effective
only when periodic_offload=true .Default is 100
|
file_dump_interval_s = 3
|
Specify the frequency, in seconds, of how
often the trace events data is read from the trace buffer in XRT
and appended to the event trace files in the SD card. This
option is effective only when periodic_offload=true . Default is 5
|
start_type = time |
iteration | kernel_event0
|
To effectively use the finite trace buffer
available in DDR, this option enables the capability of delaying
the trace based on time, iteration or user-defined event. To use
the start_type = kernel_event0
option, you must add the event0() intrinsic in the kernel code that
generates core event 0 for profiling. start_type = time|iteration option should be
combined with the start_time
and start_iteration options
respectively as explained below. For more information on delayed
event trace, refer to the topic Using the Delayed Event Trace. |
start_time =
<1000|1s|1ms|1us|1ns>
|
This option specifies the start delay of event trace in terms of
either AI Engine clock cycles or time in sec,
ms, us or ns. If no units are specified, the value is in
AI Engine clock cycles. This option
should only be used in combination with the start_type = time option. |
start_iteration =
<int>
|
This option allows to start the event trace
based on the graph iteration count. To use this type of event
trace start, you must re-compile the AI Engine design with
an option graph-iterator-event . This option should only be used
in combination with start_type =
iteration . If no value is specified, the default
iteration is taken as 1. |
graph_based_aie_tile_metrics = <graph
name|all>:<kernel
name|all>:<off|functions|partial_stalls|all_stalls>
|
This option configures the AI Engine event trace metric to be applied for all kernels in all or a specific graph. The metric is applied to the tile, even if there are multiple kernels running on the tile. |
tile_based_aie_tile_metrics =
<{<column>,<row>}|all>:<off|functions|partial_stalls|all_stalls>
|
This option configures the AI Engine event trace metric to be applied for a single tile or all tiles. |
tile_based_aie_tile_metrics =
{<mincolumn,<minrow>}:{<maxcolumn>,<maxrow>}:<off|functions|partial_stalls|all_stalls>
|
This option configures the AI Engine event trace metric to be applied for all tiles in a range. |
graph_based_interface_tile_metrics = <graph
name|all>:<port
name|all>:<off|input_ports|output_ports|input_ports_stalls|output_ports_stalls|
input_ports_details|output_ports_details>
|
This option configures the event trace of the interface tile metric applied for all or specific ports in all or a specific graph. |
tile_based_interface_tile_metrics =
<column|all>:off|input_ports|output_ports
|input_ports_stalls|output_ports_stalls|input_ports_details|output_ports_details>
[:<channel 1>][:<channel 2>]
|
This option configures the event trace of the interface tile applied for a single tile or all tiles. |
tile_based_interface_tile_metrics =
<mincolumn>:<maxcolumn>:<off|input_ports|output_ports|input_ports_stalls|output_ports_stalls
|input_ports_details|output_ports_details>[:<channel
1>][:<channel 2>]
|
This options configures the event trace of the interface tile applied for all tiles in a range. |
graph_based_memory_tile_metrics = <graph
name|all>:<buffer
name|all>:<off|input_channels|input_channels_details|output_channels|output_channels_details|memory_stats|mem_trace>[:<channel>]
|
This option configures the event trace of the Memory Tile metric
applied for a set of graph/buffers. Note: This option is applicable to AI Engine-ML devices
only.
|
tile_based_memory_tile_metrics =
<column|all>:<off|channels|input_channels_stalls|output_channels_stalls>[:<channel
1>][:<channel 2>]
|
This option configures the event trace of the Memory Tile metric for
a single or all tiles. Note: This option is applicable to AI Engine-ML devices
only.
|
tile_based_memory_tile_metrics =
<mincolumn>:<maxcolumn>:<off|input_channels|input_channels_stalls|output_channels|output_channels_stalls>[:<channel
1>][:<channel 2>]
|
This option configures the event trace of the Memory Tile metric for
all tiles in a range. Note: This option is applicable to AI Engine-ML devices
only.
|
For example,
# Example 1 : trace function events for all tiles used in the AI Engine array
[AIE_trace_settings]
tile_based_aie_tile_metrics = all:functions
# Example 2 : trace function events in all the kernels in all graphs (Similar to the example above)
[AIE_trace_settings]
graph_based_aie_tile_metrics = all:all:functions
# Example 3 : trace all function stalls events on all used tiles
[AIE_trace_settings]
tile_based_aie_tile_metrics = all:all_stalls
# Example 4 : trace function events within the bounding box of tiles or
# on specific tiles
[AIE_trace_settings]
tile_based_aie_tile_metrics = {4,1}:{6,2}:functions_all_stalls; {4,1}:functions
# Example 5 : trace events specified by graph name
[AIE_trace_settings]
graph_based_aie_tile_metrics = chain_0:all:functions; chain_1:all:all_stalls; chain_2:all:functions_partial_stall
# Example 6 : trace functions_all_stalls events all kernels in all graphs
# and trace function events occuring in all kernels in the graph chain_0
[AIE_trace_settings]
graph_based_aie_tile_metrics = all:all:all_stalls; chain_0:all:functions
# Example 7 : Trace graphs and all kernels function events and override
# tile 4,1 with functions all stalls
[AIE_trace_settings]
graph_based_aie_tile_metrics = all:all:functions
tile_based_aie_tile_metrics = {4,1}:all_stalls
# Example 8 : Trace all graphs and all kernels function events and turn off
# trace in tile 4,1
[AIE_trace_settings]
graph_based_aie_tile_metrics = all:all:functions
tile_based_aie_tile_metrics = {4,1}:off
# Example 9 : Trace multi-level hierarchical graph
[AIE_trace_settings]
graph_based_aie_tile_metrics = MyGraph.sub_graph1.sub_graph2:all:functions
#Examples on Delayed event trace
# Example 10 : Trace function events in all the kernels in all graphs,
# delayed by 1ms.
[AIE_trace_settings]
graph_based_aie_tile_metrics = all:all:functions
start_type = time
start_time = 1ms
# Example 11 : Trace function events for all tiles used in the AI Engine
# array, delayed by 10 graph iterations.
[AIE_trace_settings]
tile_based_aie_tile_metrics = all:functions
start_type = Iteration
start_iteration = 10
# Example 12 : Trace user defined event in the kernel 'add' in graph 'MyGraph'.
[AIE_trace_settings]
graph_based_aie_tile_metrics = MyGraph:add:functions
start_type = kernel_event0
Note: Metrics
<functions_all_stalls>
and
<functions_partial_stalls>
are renamed to
all_stalls
and partial_stalls
respectively for
runtime configuration.