AI Engine Event Trace and Analysis - 2024.1 English

Vitis Tutorials: AI Engine

Document ID
XD100
Release Date
2024-10-30
Version
2024.1 English

This stage helps you determine the AI Engine kernel or graph construct causing design performance drop or stall or causing a deadlock by:

  • Running and analyzing runtime trace data using the AI Engine Event trace flow.

  • Profiling Intra-kernel performance.

  • Using the AMD Vitis™ IDE debugger to debug kernel source code.

Build the Design for Event Trace Analysis Explains how to use the different event trace options for compiling and its significance. Also walks through the steps to generate a hardware image.
      - Prepare for the Hardware Run
Event Trace Analysis - XRT Flow Explains how to do an AI Engine event trace and analysis by setting up the configuration file, `xrt.ini`, and run the hardware design to generate the trace data using the XRT flow.
      - Launch the Vitis Analyzer to Examine the Event Trace Files
      - Details of the Event Trace data
Event Trace Analysis - XSDB Flow This method explains how to use the XSDB-based flow to perform event trace analysis on an AI Engine design.
Event Trace Analysis - HSDP This method explains how to use the XSDB-based flow using HSDP to perform event trace analysis on an AI Engine design.
Event Trace Considerations This method explains how to use the XSDB-based flow to perform event trace analysis on an AI Engine design.
      - Event Trace Choice Considerations
      - Number of Event Trace Streams Methodology
      - Event Trace Limitations
Debug the Host/Kernel Source Code Using the Vitis IDE Explains how to set up the target connection for hardware in the Vitis IDE and debug the host code and kernel source code in the Vitis IDE debugger.

Event Trace Analysis Features

This tutorial targets the event trace feature running on the hardware board that allows you to understand how the design is executed on hardware. With support from the Vitis analyzer, you can view the function calls, stalls (both execution and memory), and the flow of execution in the AI Engine. This information is helpful to improve the overall design performance. The steps within this tutorial introduce the event trace compilation options, running the design on hardware to generate event trace with XSDB and XRT flows, collect the generated event trace data, and launch the Vitis_analyzer to review the design execution on hardware.

Before starting this tutorial:

  • It is expected that you cloned the git repository, and the design files are ready to build.

  • It is expected that you have run the steps to set the environment variables as described in Introduction.

Build the Design

To run the event trace on hardware, it is required to compile the AI Engine graph with --event-trace and other appropriate flags. The flags are categorized based on the way the trace data needs to be captured.

  • Using the runtime as an argument, you can compile the AI Engine graph to be set up for event trace, and specify the type of profile data to capture at runtime.

  • The other way is to specify one of the functions, functions_partial_stalls, or functions_all_stalls as a type of profile data during compile time, and recompile the design to capture a different type of data during runtime.

For more information on different event trace options for AI Engine compilation, refer to Event Trace Options in AI Engine Tools and Flows User Guide (UG1076).

This tutorial uses the --event-trace=runtime, --event-trace-port=plio, --num-trace-streams=8, and --xlopt=0 options.

  • --event-trace=runtime option enables runtime event trace configuration.

  • --event-trace-port=plio option sets the AI Engine event tracing port to be plio. Default is gmio.

  • --num-trace-streams=8 option sets the number of trace streams to be 8 to collect the generated event trace data.

  • --xlopt=0 option disables the aiecompiler optimization for debug purposes.

Design with the --event-trace=runtime option in the build that enables runtime events during compile time. This only needs to build the design once and allows different event trace levels to be generated during runtime via the XSDB or XRT flow.

  1. Navigate to the cmd_src/ directory, and open the Makefile.

  2. Search for AIE_INCLUDE_FLAGS, and add the --event-trace=runtime --event-trace-port=plio --num-trace-streams=8 --xlopt=0 options at the end. This flag is passed to the aiecompiler command during compilation

  3. Do make all TARGET=hw.

  4. Make sure the package step is completed by checking the sd_card.img inside the sw/ directory.

Prepare for the Hardware Run

After the design is built, you are ready to run on the hardware board.

  • Flash the SD card with the built sd_card.img.

  • Plug the flashed SD card into the SD card slot of the VCK190 board.

  • Connect the USB type C cable to the board and computer that supports the serial port connection.

  • Set the serial port configuration with Speed=115200, Data=8 bit, Parity=none, Stop bits=1 bit, and flow control=none.

  • Power up the VCK190 board to see boot messages from the serial connection.

XRT Flow

  1. In the hardware Linux console, create the xrt.ini file on the SD card using the following lines:

    #xrt.ini
    [Debug]
       aie_trace = true
    
    [AIE_trace_settings]
       reuse_buffer = true
       periodic_offload = true
       buffer_offload_interval_us = 100
       buffer_size = 16M
       graph_based_aie_tile_metrics = all:all:all_stalls
    

    More details about these settings are explained in XRT Trace Options in the AI Engine Tools and Flows User Guide (UG1076).

  2. Run the application.

    cd /run/media/mmcblk0p1
    ./ps_app.exe a.xclbin
    
  3. After a successful run, the files created on the SD card are:

    • aie_trace_N.txt

    • aie_event_runtime_config.json

    • xrt.run_summary

  4. Copy these files back to the work space at same level as the design’s Work/ directory.

Launch the Vitis Analyzer to Examine the Event Trace Files

  1. Open the Vitis Analyzer using the vitis_analyzer xrt.run_summary command.

  2. It is required to set the design’s compile summary file when you run the Vitis Analyzer for first time on the design.

  3. Select Trace from the left pane of the Vitis Analyzer. Initially, details of the event are not shown. initial trace

  4. Zoom in to see the detailed information for each state of the AI Engine tiles. trace zoom view

Details of the Event Trace Data

  1. Select the Graph view to examine the design. Select p_d to identify the tile as (25,0). graph view trace

  2. Adjust the trace view to the correct size with the zoom in or zoom out icons, and move the marker to the end of peak_detect or the beginning of _main. This is considered as the beginning of an iteration. A period of lock stall indicates data is sent from the PL to AIE tile. adjusted trace view

  3. Observe the end of the peak_detect kernel corresponding to the core(25,0) and start of the core(24,0) and core(25,1). If you observe the graph view, you can notice that the kernel peak_detect sends data to both the upscale and data_shuffle kernels. The same behavior can be observed in the trace view as well.

  4. You can calculate the execution time of one iteration as follows. Place the marker at the start and end of the iteration and (1) - (2) gives 262.2 ns which is ~= 329 cycles. This matches with the Function time in the profile data from both the AI Engine simulation and hardware emulation. iteration time

XSDB Flow

  1. Program the device using the sd_card image, and remove any xrt.ini files in the sd_card to avoid misbehavior with the XSDB commands.

  2. Target connection setup: Run the hardware server from the computer that connects to the target board. To do so, launch the hardware server from the computer that has a JTAG connection to the VCK190 board.

  3. Go to the directory where the AI Engine compile Work/directory is present, and launch XSDB.

  4. From the XSDB terminal, issue the following commands from the XSDB prompt:

    xsdb
    %xsdb connect -url TCP:${COMPUTER NAME/IP}:3121
    %xsdb ta
    %xsdb ta 1
    %xsdb source $::env(XILINX_VITIS)/scripts/vitis/util/aie_trace.tcl
    %xsdb aietrace start -graphs mygraph -link-summary ./tutorial.xsa.link_summary -base-address 0x900000000 -depth 0x800000 -graph-based-aie-tile-metrics "all:all:all_stalls"
    
    • -base-address 0x900000000 is the address that needs to avoid collision with your design.

    • -depth 0x8000000 is the size of the event trace file. Adjust accordingly with your design size and amount of the event trace data.

  5. After the above aietrace start command is run, switch to the hardware Linux console, and run the application.

    cd /run/media/mmcblk0p1
    ./host.exe a.xclbin
    
  6. After the design run completes on the hardware, stop the trace using aietrace stop. The generated events and run_summary files need to be collected and are ready to be examined.

  7. Inspect the generated events and run_summary files.(aie_trace_N.txt and aie_trace_profile.run_summary) in your local workspace where XSDB is launched.

  8. Open the aie_trace_profile.run_summary file in the Vitis Analyzer, and observe the results as explained in the XRT flow.

Event Trace Considerations

Event Trace Choice Considerations

Based on the design, select GMIO if the design has limited PL resources left for event trace generation.

Baremetal PetaLinux Bandwidth PL Resources Used
PLIO/XSDB O O pl clock-rate * trace-plio-width Yes
PLIO/XRT O pl clock-rate * trace-plio-width Yes
GMIO/XSDB O O No
GMIO/XRT O No

Number of Event Trace Streams Methodology

Number of Cores Recommended Number of Streams
Less than 10 1
Between 10 and 20 2
Between 20 and 40 4
Between 40 and 80 8
Larger than 80 16
Intense debug 16
AMD recommends no more than 16 streams due to resource constraints

Event Trace Limitations

  1. Due to limited resources, overruns can be seen from the event trace. Follow Number of Event Trace Streams Methodology to configure the number of trace streams to minimize the overruns issue.

  2. It is required that the --broadcast-enable-core option is used to compile the design. This is to eliminate time sync issues where the start time of each tile is off by ~100 ns or more.

  3. Run forever applications are supported by the XSDB flow only.

Event Trace Analysis Using HSDP

In the traditional hardware event trace, the trace information is stored in DDR memory available in the Versal device initially, and offloaded to SD card after the application run completes. This imposes limitations on the amount of trace information that can be stored and analyzed. AI Engine trace offload via HSDP(High Speed Debug Port) has more DDR memory in the SmartLynq+ module and supports analyzing large quantities of trace information for complex designs.

1. To run the event trace on hardware, it is required to compile the AI Engine graph with  `-event-trace-port=plio` option. This sets the event tracing port to PLIO. 
   Note: If the event tracing port is set to GMIO, the AI Engine trace cannot be offloaded via HSDP.

  Add `--aie.event-trace-port=plio` to AIE_FLAGS in Makefile 

2. After the AI Engine graph and the C/C++ kernels are compiled, and any RTL kernels are packaged, the Vitis v++ --link command links them with the target platform to build the platform file (XSA). For offloading the AI Engine trace via HSDP, it is required to add the `–profile.aie_trace_offload=HSDP` option to the v++ -link command. Add below lines in system.cfg file
```
[profile]
aie_trace_offload=HSDP
```
With this, a new HSDP IP gets instantiated for AI Engine trace offload and all the PLIO event trace streams are connected to the HSDP IP.  You can see System_DPA getting instantiated in VitisRegion.