XRT Flow - 2023.1 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2023-06-23
Version
2023.1 English

The XRT flow is as follows:

  1. Burn the generated sd_card.img to the physical SD card.
  2. Create the xrt.ini file in the sd_card folder as described in this section to enable xrt flow.

    An example xrt.ini file is shown below.

    
    # Main switch to turn on aie trace
    [Debug]
    aie_trace = true
    # Continuous trace knobs
    [AIE_trace_settings]
    reuse_buffer = true
    periodic_offload = true
    # Time to wait between trace reads
    buffer_offload_interval_us = 100
    # Total amount of device memory shared between trace streams
    buffer_size = 16M
    # granularity
    graph_based_aie_tile_metrics = all:all:functions
  3. Run the design on hardware to trace hardware events.
  4. Copy the captured trace data from the sd_card folder to your design at same level as the design Work directory. The trace data is generated in the same location as the host application on the SD card. They are xrt.run_summary, aie_event_runtime_config.json, and aie_trace_N.txt.
  5. Use the Vitis Analyzer to import and analyze data with this command.
    vitis_analyzer xrt.run_summary
[Debug]
aie_trace = true 

# Section for AIE trace settings
[AIE_trace_settings]

# Size of AIE trace buffer in DDR (Format: <Integer>[K|k|M|m|G|g]; Default: 1M)
buffer_size = 100M 

# Graph/Kernel name
graph_based_aie_tile_metrics = <graph name|all>:<kernel_name|all:<off|functions|functions_partial_stalls|functions_all_stalls> 

# AI Engine Tiles
# Single or all tiles
tile_based_aie_tile_metrics = <{<column>,<row>}|all>:<off|functions|functions_partial_stalls|functions_all_stalls>

# Range of tiles
tile_based_aie_tile_metrics = {<mincolumn,<minrow>}:{<maxcolumn>,<maxrow>}:<off|functions|functions_partial_stalls|functions_all_stalls>
Table 1. XRT Trace Options
Option Description
aie_trace=true Enables AI Engine event trace during application execution.
buffer_size=100M Sets the size of the event trace buffer in DDR memory.
reuse_buffer = true Enables the reuse of trace buffer in DDR memory. When this option is enabled, the DDR trace buffer is treated as a circular buffer, and trace data is continuously offloaded from XRT. This option only applies to event trace data being captured using PLIO trace. Default is false.
start_type = time | iteration | kernel_event0 To effectively use the finite trace buffer available in DDR, this option enables the capability of delaying the trace based on time, iteration or user-defined event. To use the start_type = kernel_event0 option, you must add the event0() intrinsic in the kernel code that generates core event 0 for profiling. start_type = time|iteration option should be combined with the start_time and start_iteration options respectively as explained below. For more information on delayed event trace, refer to the topic Using the Delayed Event Trace.
start_time = <1000|1s|1ms|1us|1ns> This option specifies the start delay of event trace in terms of either AI Engine clock cycles or time in sec, ms, us or ns. If no units are specified, the value is assumed to be in AI Engine clock cycles. This option should only be used in combination with the start_type = time option.
start_iteration = <int> This option allows to start the event trace based on the graph iteration count. To use this type of event trace start, you must re-compile the AI Engine design with an option graph-iterator-event. This option should only be used in combination with start_type = iteration. If no value is specified, the default iteration is taken as 1.
periodic_offload = true / false Enable offloading of trace data from DDR to XRT at regular intervals while the application is running. The default option is set to true. This option when used in combination with the "reuse_buffer" option enables continuous offloading of trace data, while not running out of trace buffer memory.
offload_interval_us = 10 Specify the frequency, in milliseconds, of how often the trace events data is read from the trace buffers in DDR memory to the buffers in XRT. This option is effective only when periodic_offload=true.This option only applies to event trace data being captured using PLIO trace. Default is 100
file_dump_interval_s = 3 Specify the frequency, in seconds, of how often the trace events data is read from the trace buffer in XRT and appended to the event trace files in the SD card. This option is effective only when periodic_offload=true. Default is 5
graph_based_aie_tile_metrics = <graph name|all>:<kernel name|all>:<off|functions|functions_partial_stalls|functions_all_stalls> This option configures the AI Engine event trace metric to be applied for all kernels in all or a specific graph. The metric is applied to the tile, even if there are multiple kernels running on the tile.
tile_based_aie_tile_metrics = <{<column>,<row>}|all>:<off|functions|functions_partial_stalls|functions_all_stalls> This option configures the AI Engine event trace metric to be applied for a single tile or all tiles.
tile_based_aie_tile_metrics = {<mincolumn,<minrow>}:{<maxcolumn>,<maxrow>}:<off|functions|functions_partial_stalls|functions_all_stalls> This option configures the AI Engine event trace metric to be applied for all tiles in a range.
For example,
# Example 1 : trace function events for all tiles used in the AI Engine array 
[AIE_trace_settings]
tile_based_aie_tile_metrics = all:functions 

# Example 2 : trace function events in all the kernels in all graphs (Similar to the example above)
[AIE_trace_settings]
graph_based_aie_tile_metrics = all:all:functions 

# Example 3 : trace all function stalls events on all used tiles 
[AIE_trace_settings]
tile_based_aie_tile_metrics = all:functions_all_stalls 

# Example 4 : trace function events within the bounding box of tiles or on specific tiles
[AIE_trace_settings]
tile_based_aie_tile_metrics = {4,1}:{6,2}:functions_all_stalls; {4,1}:functions 

# Example 5 : trace events specified by graph name
[AIE_trace_settings]
graph_based_aie_tile_metrics = chain_0:all:functions; chain_1:all:functions_all_stalls; chain_2:all:functions_partial_stall 

# Example 6 : trace functions_all_stalls events all kernels in all graphs and trace function events occuring in all kernels  in the graph chain_0 
[AIE_trace_settings]
graph_based_aie_tile_metrics = all:all:functions_all_stalls; chain_0:all:functions 

# Example 7 : Trace graphs and all kernels function events and override tile 4,1 with functions all stalls
[AIE_trace_settings]
graph_based_aie_tile_metrics = all:all:functions
tile_based_aie_tile_metrics = {4,1}:functions_all_stalls 

# Example 8 : Trace all graphs and all kernels function events and turn off trace in tile 4,1
[AIE_trace_settings]
graph_based_aie_tile_metrics = all:all:functions
tile_based_aie_tile_metrics = {4,1}:off

#Examples on Delayed event trace

# Example 9 : Trace function events in all the kernels in all graphs, delayed by 1ms.
[AIE_trace_settings]
graph_based_aie_tile_metrics = all:all:functions
start_type = time
start_time = 1ms

# Example 10 : Trace function events for all tiles used in the AI Engine array, delayed by 10 graph iterations. 
[AIE_trace_settings]
tile_based_aie_tile_metrics = all:functions
start_type = Iteration
start_iteration = 10

# Example 11 : Trace user defined event in the kernel 'add' in graph 'MyGraph'.
[AIE_trace_settings]
graph_based_aie_tile_metrics = MyGraph:add:functions
start_type = kernel_event0