This stage helps you determine the AI Engine kernel or graph construct causing design performance drop or stall or causing a deadlock by:
Running and analyzing runtime trace data using the AI Engine Event trace flow.
Profiling Intra-kernel performance.
Using the AMD Vitis™ IDE debugger to debug kernel source code.
Build the Design for Event Trace Analysis |
Explains how to use the different event trace options for compiling and its significance. Also walks through the steps to generate a hardware image. - Prepare for the Hardware Run |
Event Trace Analysis - XRT Flow |
Explains how to do an AI Engine event trace and analysis by setting up the configuration file, `xrt.ini`, and run the hardware design to generate the trace data using the XRT flow. - Launch the Vitis Analyzer to Examine the Event Trace Files - Details of the Event Trace data |
Event Trace Analysis - XSDB Flow |
This method explains how to use the XSDB-based flow to perform event trace analysis on an AI Engine design. |
Event Trace Analysis - HSDP |
This method explains how to use the XSDB-based flow using HSDP to perform event trace analysis on an AI Engine design. |
Event Trace Considerations |
This method explains how to use the XSDB-based flow to perform event trace analysis on an AI Engine design. - Event Trace Choice Considerations - Number of Event Trace Streams Methodology - Event Trace Limitations |
Debug the Host/Kernel Source Code Using the Vitis IDE |
Explains how to set up the target connection for hardware in the Vitis IDE and debug the host code and kernel source code in the Vitis IDE debugger. |
Event Trace Analysis Features
This tutorial targets the event trace feature running on the hardware board that allows you to understand how the design is executed on hardware. With support from the Vitis analyzer, you can view the function calls, stalls (both execution and memory), and the flow of execution in the AI Engine. This information is helpful to improve the overall design performance. The steps within this tutorial introduce the event trace compilation options, running the design on hardware to generate event trace with XSDB and XRT flows, collect the generated event trace data, and launch the Vitis_analyzer to review the design execution on hardware.
Before starting this tutorial:
It is expected that you cloned the git repository, and the design files are ready to build.
It is expected that you have run the steps to set the environment variables as described in Introduction.
Build the Design
To run the event trace on hardware, it is required to compile the AI Engine graph with --event-trace
and other appropriate flags. The flags are categorized based on the way the trace data needs to be captured.
Using the
runtime
as an argument, you can compile the AI Engine graph to be set up for event trace, and specify the type of profile data to capture at runtime.The other way is to specify one of the
functions
,functions_partial_stalls
, orfunctions_all_stalls
as a type of profile data during compile time, and recompile the design to capture a different type of data during runtime.
For more information on different event trace options for AI Engine compilation, refer to Event Trace Options in AI Engine Tools and Flows User Guide (UG1076).
This tutorial uses the --event-trace=runtime
, --event-trace-port=plio
, --num-trace-streams=8
, and --xlopt=0
options.
--event-trace=runtime
option enables runtime event trace configuration.--event-trace-port=plio
option sets the AI Engine event tracing port to beplio
. Default isgmio
.--num-trace-streams=8
option sets the number of trace streams to be 8 to collect the generated event trace data.--xlopt=0
option disables the aiecompiler optimization for debug purposes.
Design with the --event-trace=runtime
option in the build that enables runtime events during compile time. This only needs to build the design once and allows different event trace levels to be generated during runtime via the XSDB or XRT flow.
Navigate to the
cmd_src/
directory, and open theMakefile
.Search for
AIE_INCLUDE_FLAGS
, and add the--event-trace=runtime --event-trace-port=plio --num-trace-streams=8 --xlopt=0
options at the end. This flag is passed to theaiecompiler
command during compilationDo
make all TARGET=hw
.Make sure the package step is completed by checking the
sd_card.img
inside thesw/
directory.
Prepare for the Hardware Run
After the design is built, you are ready to run on the hardware board.
Flash the SD card with the built
sd_card.img
.Plug the flashed SD card into the SD card slot of the VCK190 board.
Connect the USB type C cable to the board and computer that supports the serial port connection.
Set the serial port configuration with Speed=115200, Data=8 bit, Parity=none, Stop bits=1 bit, and flow control=none.
Power up the VCK190 board to see boot messages from the serial connection.
XRT Flow
In the hardware Linux console, create the
xrt.ini
file on theSD card
using the following lines:#xrt.ini [Debug] aie_trace = true [AIE_trace_settings] reuse_buffer = true periodic_offload = true buffer_offload_interval_us = 100 buffer_size = 16M graph_based_aie_tile_metrics = all:all:all_stalls
More details about these settings are explained in XRT Trace Options in the AI Engine Tools and Flows User Guide (UG1076).
Run the application.
cd /run/media/mmcblk0p1 ./ps_app.exe a.xclbin
After a successful run, the files created on the SD card are:
aie_trace_N.txt
aie_event_runtime_config.json
xrt.run_summary
Copy these files back to the work space at same level as the design’s
Work/
directory.
Launch the Vitis Analyzer to Examine the Event Trace Files
Open the Vitis Analyzer using the
vitis_analyzer xrt.run_summary
command.It is required to set the design’s compile summary file when you run the Vitis Analyzer for first time on the design.
Select Trace from the left pane of the Vitis Analyzer. Initially, details of the event are not shown.
Zoom in to see the detailed information for each state of the AI Engine tiles.
Details of the Event Trace Data
Select the
Graph
view to examine the design. Selectp_d
to identify the tile as (25,0).Adjust the trace view to the correct size with the zoom in or zoom out icons, and move the marker to the end of
peak_detect
or the beginning of_main
. This is considered as the beginning of an iteration. A period of lock stall indicates data is sent from the PL to AIE tile.Observe the end of the
peak_detect
kernel corresponding to the core(25,0) and start of the core(24,0) and core(25,1). If you observe the graph view, you can notice that the kernelpeak_detect
sends data to both theupscale
anddata_shuffle
kernels. The same behavior can be observed in the trace view as well.You can calculate the execution time of one iteration as follows. Place the marker at the start and end of the iteration and (1) - (2) gives 262.2 ns which is ~= 329 cycles. This matches with the
Function time
in the profile data from both the AI Engine simulation and hardware emulation.
XSDB Flow
Program the device using the sd_card image, and remove any
xrt.ini
files in the sd_card to avoid misbehavior with the XSDB commands.Target connection setup: Run the hardware server from the computer that connects to the target board. To do so, launch the hardware server from the computer that has a JTAG connection to the VCK190 board.
Go to the directory where the AI Engine compile
Work/directory
is present, and launch XSDB.From the XSDB terminal, issue the following commands from the XSDB prompt:
xsdb %xsdb connect -url TCP:${COMPUTER NAME/IP}:3121 %xsdb ta %xsdb ta 1 %xsdb source $::env(XILINX_VITIS)/scripts/vitis/util/aie_trace.tcl %xsdb aietrace start -graphs mygraph -link-summary ./tutorial.xsa.link_summary -base-address 0x900000000 -depth 0x800000 -graph-based-aie-tile-metrics "all:all:all_stalls"
-base-address 0x900000000
is the address that needs to avoid collision with your design.-depth 0x8000000
is the size of the event trace file. Adjust accordingly with your design size and amount of the event trace data.
After the above
aietrace
start command is run, switch to the hardware Linux console, and run the application.cd /run/media/mmcblk0p1 ./host.exe a.xclbin
After the design run completes on the hardware, stop the trace using
aietrace stop
. The generated events and run_summary files need to be collected and are ready to be examined.Inspect the generated events and run_summary files.(
aie_trace_N.txt
andaie_trace_profile.run_summary
) in your local workspace where XSDB is launched.Open the
aie_trace_profile.run_summary
file in the Vitis Analyzer, and observe the results as explained in the XRT flow.
Event Trace Considerations
Event Trace Choice Considerations
Based on the design, select GMIO if the design has limited PL resources left for event trace generation.
Baremetal | PetaLinux | Bandwidth | PL Resources Used | |
---|---|---|---|---|
PLIO/XSDB | O | O | pl clock-rate * trace-plio-width | Yes |
PLIO/XRT | O | pl clock-rate * trace-plio-width | Yes | |
GMIO/XSDB | O | O | No | |
GMIO/XRT | O | No |
Number of Event Trace Streams Methodology
Number of Cores | Recommended Number of Streams |
---|---|
Less than 10 | 1 |
Between 10 and 20 | 2 |
Between 20 and 40 | 4 |
Between 40 and 80 | 8 |
Larger than 80 | 16 |
Intense debug | 16 |
AMD recommends no more than 16 streams due to resource constraints |
Event Trace Limitations
Due to limited resources, overruns can be seen from the event trace. Follow Number of Event Trace Streams Methodology to configure the number of trace streams to minimize the overruns issue.
It is required that the
--broadcast-enable-core
option is used to compile the design. This is to eliminate time sync issues where the start time of each tile is off by ~100 ns or more.Run forever applications are supported by the XSDB flow only.
Event Trace Analysis Using HSDP
In the traditional hardware event trace, the trace information is stored in DDR memory available in the Versal device initially, and offloaded to SD card after the application run completes. This imposes limitations on the amount of trace information that can be stored and analyzed. AI Engine trace offload via HSDP(High Speed Debug Port) has more DDR memory in the SmartLynq+ module and supports analyzing large quantities of trace information for complex designs.
1. To run the event trace on hardware, it is required to compile the AI Engine graph with `-event-trace-port=plio` option. This sets the event tracing port to PLIO.
Note: If the event tracing port is set to GMIO, the AI Engine trace cannot be offloaded via HSDP.
Add `--aie.event-trace-port=plio` to AIE_FLAGS in Makefile
2. After the AI Engine graph and the C/C++ kernels are compiled, and any RTL kernels are packaged, the Vitis v++ --link command links them with the target platform to build the platform file (XSA). For offloading the AI Engine trace via HSDP, it is required to add the `–profile.aie_trace_offload=HSDP` option to the v++ -link command. Add below lines in system.cfg file
```
[profile]
aie_trace_offload=HSDP
```
With this, a new HSDP IP gets instantiated for AI Engine trace offload and all the PLIO event trace streams are connected to the HSDP IP. You can see System_DPA getting instantiated in VitisRegion.