In this stage, you can profile the AI Engine Core, Interface, and Memory modules in the XRT or XSDB flows. It is a non-intrusive feature which can be enabled at runtime using the XRT.ini
file or running scripts in XSDB. The feature uses performance counters available in the AI Engine array to gather profile data. The amount and type of data gathered is limited by the number of performance counters available.
Features
Hardware Profiling Feature - XRT Flow |
Explains how to set up the configuration file `xrt.ini` and run the hardware design to generate profile data using the XRT flow. - Open multiple profile runs in the AMD Vitis™ Analyzer - This exercise helps you understand how to open different profile summaries (two different runs) in a single Vitis Analyzer view. - Profiling Data Explanation - This explains how to analyze the AI Engine core, memory, and interface profiling data. Also discusses what action should be taken based on the stall time and DMA lock time. |
Hardware Profiling Feature - XSDB Flow |
This method explains how to use the XSDB-based flow to profile for both baremetal and Linux operating systems. |
Generating the Hardware Image
It is expected that you already generated the hardware image from Stage 1. If not, follow the steps 1-3 from Stage 1: Running the Design on Hardware.
Hardware Profiing Features
In this tutorial, you will learn how to use hardware profiling features to inspect the design. Two flows, XSDB and XRT flow, are supported to profile the AI Engine design. The profiling feature requires no design source code change to collect profiling data. No special options are required to build the design.
XRT Flow
Once the board is powered up, and you see the Linux console after PetaLinux boots up, create a
xrt.ini
file on the SD card using the following lines:[Debug] aie_profile = true [AIE_profile_settings] interval_us = 1000 graph_based_aie_metrics = all:all:heat_map tile_based_aie_memory_metrics = all:conflicts tile_based_interface_tile_metrics = all:mm2s_throughputs:0
For more information on these profile settings, refer to the AI Engine Tools and Flows User Guide (UG1076).
Save the
xrt.ini
file, and it is safe to do power-cycle on the device, whenever you add/edit thexrt.ini
file to avoid seeing any abnormal results:cd /run/media/mmcblk0p1 ./host.exe a.xclbin`
Observe the files generated in the SD_card, and copy them back to the local workspace where the design exists.
aie_profile_edge_* .csv
summary.csv
xrt.run_summary
Open the xrt.run_summary using
vitis_analyzer xrt.run_summary
, click Set Compile Directory under the AI Engine Compile Directory in the Summary view, and point to theWork/graph.aiecompile_summary
file.This opens the Vitis Analyzer window. Click the Profile_Summary and navigate to AI Engine & Memory as follows:
Observe the following metrics for Tile (24,0):
AIE TILE HEAT MAP: ACTIVE TIME = 0.003 ms
AIE TILE HEAT MAP: STALL TIME = 0.00 2ms
AIE TILE HEAT MAP: VECTOR INSTRS = 112
AIE TILE HEAT MAP: ACTIVE UTILIZATION TIME = 0.003 ms
Similarly for AIE MEMORY CONFLICTS and other tiles, you can hover you mouse on the parameter(Say,
ACTIVE TIME(MS)
) to get more details.Click Interface Channels.
In the
xrt.ini
, the metric you requested isinput_througputs
, and observe thePEAK THROUGHTPUT(MB/S)
andAVERAGE THROUGHTPUT(MB/S)
.Close the Vitis Analyzer window.
Open Multiple Profile Runs in the Vitis Analyzer
In this section, try to use a different set of metrics in the xrt.ini
file and generate another xrt.run_summary
file. Open the second run_summary on top of the first summary.
Save the profile data obtained from the first run (step 1-3) to a directory
profile_0
in your local workspace.Edit the
xrt.ini
file in SD_card with the following content, and reboot the board:[Debug] aie_profile = true [AIE_profile_settings] interval_us = 1000 graph_based_aie_metrics = all:all:execution tile_based_aie_memory_metrics = all:dma_locks tile_based_interface_tile_metrics = all:s2mm_throughputs:0
Move the obtained profile data in the directory,
profile_2
, in a local workspace.Open the first profile run summary file in the Vitis Analyzer as explained in step 4.
Add the second profile run summary file to the existing
run_summary
file using the+
option as shown below. Observe the combined metrics of first run with theheat_map
,conflicts
, andinput_throughputs
metrics and the second run withexecution
,dma_locks
, andoutput_bandwidth
metrics in the Vitis Analyzer.Click % to toggle between the absolute and percentage values of the collected design metrics.
Click the column header to sort the data within those rows. Click once to display the selected row data in ascending order. Click twice to display the selected row data in descending order. Click three times to disable the sorting function.
Profiling Data Explanation
An easy way to know the definition of profile data category by moving the cursor to the column title. For example, Active Time (ms)
is the “Amount of time (in ms) AI Engine was active when it was enabled.”
AI Engine Core Profiling Data**
Active Time (ms)
=Stall Time (ms)
+Active Utilization Time (ms)
.Take tile(25,0) as an example; tile(25,0) is active for a period of 0.003 ms, where 0.000 ms is stalled(=> Not stalled), and 0.003 ms is actively executing instructions. During the 0.003 ms active period, 0.001 milliseconds is executing vector instructions. There are 0.002 milliseconds spent on other instructions, such as load/store instructions.
There are a total of 672 Vector instructions, 264 Load Instructions, and 282 Store Instructions during the Active Utilization Time (ms).
AI Engine Memory Profiling Data
The Memory Conflict Time (ms) indicates memory access conflicts time running AI Engine execution. AMD recommends rerunning the aiesimulator with the -enable-memory-check
option to check design memory access conflicts.
The Cumulative Memory Errors Time (ms) indicates the time taken due to ECC errors in any of the data memory banks as well as the MM2S and S2MM DMAs.
Interface Profiling Data
From Profile_summary -> Interface Channels, notice the interface tile throughput values for the input and output channel.
Profiling Data Analysis
From the AI Engine core profiling data, tile(25,0) has a much larger number of Store Instructions. An indication check, tile source code if lowering number of Store Instructions can be done to improve performance.
From the AI Engine Memory profiling data, there are no Memory Conflict Time values, i.e., there are no memory violations in the source code. If there are any, it is suggested to run the AIE simulator, check for memory access violations, and clear those violations.
From the AI Engine Memory profiling data, tile(24,0) has a longer Cumulative DMA Lock Stalls Time. This leads to check the input/output PLIO area to see if the PLIO frequency or PLIO width is implemented properly. AMD suggests using the integrated logic analyzer (ILA) to check the PLIO input/output states during runtime.
XSDB Flow
It is also possible to profile the AI Engine using XSDB both on Linux and baremetal operating systems.
Program the device using the sd_card image, and remove any
xrt.ini
files in the sd_card to avoid any collision with XSDB commands.Complete the target connection setup. Run the hardware server from computer that connects to the target board. To do so, launch the hw_server from the computer that has the JTAG connection to the VCK190 board.
Go to the directory where the AI Engine compile
Work/directory
is present and launch XSDB.From the XSDB terminal, issue the following commands from the XSDB prompt:
xsdb %xsdb connect -url TCP:${COMPUTER NAME/IP}:3121 %xsdb ta %xsdb ta 1 %xsdb source $::env(XILINX_VITIS)/scripts/vitis/util/aie_profile.tcl %xsdb aieprofile start -graphs mygraph -work-dir ./Work -graph-based-aie-metrics "mygraph:all:heat_map" -tile-based-aie-memory-metrics "{25,0}:dma_locks" -tile-based-interface-tile-metrics "all:s2mm_throughputs" -interval 20 -samples 100
After the above
aieprofile
command is run, wait until Count: 10, Count: 20, … is displayed from the XSDB console. This indicates XSDB is ready to collect design profiling data.Switch to the Linux console of hardware, and run the application:
cd /run/media/mmcblk0p1 ./host.exe a.xclbin
Inspect the generated files,
aie_profile.csv
,summary.csv
, andaie_trace_profile.run_summary
in your local workspace where XSDB is launched.Open the
aie_trace_profile.run_summary
file in the Vitis Analyzer and expect to see the following result: A similar profile data analysis is applicable for the XSDB-generated profile summary as explained in the XRT flow.
Support
GitHub issues will be used for tracking requests and bugs. For questions, go to support.xilinx.com.
Copyright © 2020–2024 Advanced Micro Devices, Inc