Hardware-Emulation Debug Walkthrough
Introduction
To simulate the entire system, including the AI Engine graph and programmable logic (PL) along with the XRT-based host application to control the AI Engine and PL, for a specific board and platform, you must use the hardware emulation flow. This flow includes the SystemC model of the AI Engine, transaction-level SystemC models for the NoC, double-data rate (DDR) memory, PL kernels (RTL), and processing system (PS) (running on the Quick Emulator (QEMU)). This can be used in analyzing the data which helps you gauge the efficiency of the kernels, the stall and active times associated with each AI Engine, and pinpoint the AI Engine kernel whose performance might not be optimal.
The following are some of the features of the hardware emulation that are covered in this section of the tutorial:
Features
Build for Hardware Emulation Using the Vitis IDE | Explains how to create a system project and build for hardware emulation and run. |
Debug PL Kernels Using the Vivado Logic Simulator | Explains how to use the AMD Vivado™ XSIM to debug the PL kernels. |
Performance of the AI Engine Using the Hardware Emulation Results | This section profiles the system for hardware emulation and compares the throughput of the AI Engine design in hardware emulation with the throughput in the AI Engine simulation. |
Command Line Project Source Code Debug with the Vitis Unified IDE | This section helps you debug your command line project by using the features of the AMD Vitis™ IDE debugger without porting your system design to the IDE. |
Section 1
Build for Hardware Emulation Using the Vitis IDE
Before getting into this section, it is expected that you created an AI Engine application in the Vitis IDE and ran AIEsimulation Build and Simulate in the Vitis IDE.
Create a system project manually using the steps mentioned in Port a Command Line Project to the Vitis IDE System Project and download the Vitis IDE exported project (Download Vitis IDE project).
Besides referring to the link provided above to create a system project, make sure to follow the following points to avoid unnecessary issues during the emulation process:
While creating a HW-link project, the Vitis IDE tool, by default, creates a
binary_container_1-link.cfg
file under the{$PROJECT}/system_project/hw_link/
directory that contains the connectivity as follows:[connectivity] nk=mm2s:1:mm2s_1 nk=s2mm:2:s2mm_1.s2mm_2 sc=mm2s_1.s:ai_engine_0.inx sc=ai_engine_0.data_shuffle:s2mm_1.s sc=ai_engine_0.upscale_out:s2mm_2.s
If you are porting a command line project to the Vitis IDE environment, make sure to replace the above connectivity statements that start with
nk
in yoursystem.cfg
file and add as a source to your HW-Link project.As the AI Engine graph is being loaded by the host PS application, you can defer the running of the graph after the graph has been loaded using the
xrt::graph
API. By default, the AMD platform management controller (PMC) loads and runs the graph. However, the v++--package.defer_aie_run
option will let you defer the graph run until after the graph has been loaded using thexrt::graph
API.
Steps to build the system project: Go to the Flow Navigator -> [system_project] component (Section: HARDWARE EMULATION):
a. Select Build Binary Container from
LINK-binary_container_1
. Select the checck box to build the components added in the binary container.b. Select Build Package from
PACKAGE
.After packaging, everything is set to run emulation. In the Flow Navigator -> [system_project] component -> select Start Emulator -> show waveform -> start :
Select the Run for Time(10us) in the XSIM GUI taskbar, observe the Linux bootup in the Vitis IDE
TASK: EMULATION FOR SYSTEM_PROJECT
.
You can stop emulation by clicking In the Flow Navigator -> [system_project] component -> select Stop Emulator.
Section 2
Debug PL Kernels Using the Vivado Logic Simulator
This section walks you through debugging PL kernels in the Vivado logic simulator.
In the Vitis IDE, launch the hardware emulation using Vitis -> Start/Stop Emulator.
Enable the Show Waveform option, and select Start.
This invokes the Vivado XSIM in standalone mode. Parallelly, you can observe the messages in the Vitis IDE Emulation Console.
Hit the Run button in the Vivado XSIM GUI taskbar, and observe the Linux bootup in the Vitis IDE Emulation Console.
Observe the data coming up in the XSIM and parallelly, the Emulation Console messages gets updated in Vitis IDE GUI.
After processing all the data, you can see the following messages in the Vivado XSIM Tcl Console:
Info: (I804) /IEEE_Std_1666/deprecated: the notify() function is deprecated use sc_event::notify() // Interrupt Monitor : interrupt for ap_done detected @ "117153000" // Interrupt Monitor : interrupt for ap_ready detected @ "117153000" // Interrupt Monitor : interrupt for ap_done detected @ "118292000" // Interrupt Monitor : interrupt for ap_ready detected @ "118292000" // Interrupt Monitor : interrupt for ap_done detected @ "118478000" // Interrupt Monitor : interrupt for ap_ready detected @ "118478000" $stop called at time : 157304 ns run: Time (s): cpu = 00:00:36 ; elapsed = 00:03:57 . Memory (MB): peak = 13910.660 ; gain = 135.137 ; free physical = 23585 ; free virtual = 54027
You can also notice the following messages in the Vitis IDE DEBUG CONSOLE.
XAIEFAL: INFO: Resource group Avail is created. XAIEFAL: INFO: Resource group Static is created. XAIEFAL: INFO: Resource group Generic is created. Input memory virtual addr 0x0xffff7fb56000x Output memory virtual addr 0x0xffff7fb55000x Output memory virtual addr 0x0xffff7fb54000x run mm2s run s2mm graph run graph end After MM2S wait After S2MM_1 wait After S2MM_2 wait TEST PASSED
Now observe the waveform in the Vivado XSIM GUI. The system contains one
mm2s
compute unit and twos2mm
compute units. You can notice them in the waveform viewer as follows:You can form a group of signals by right-clicking anywhere in the Name column, and select New Group. Add all the MM2S and S2MM-related signals to this group by dragging them correspondingly.
Zoom into the waveform window to locate the transactions clearly.
The
m_axi_gmem
is the transaction level signal which indicates theRead
transaction inmm2s
andwrite
transaction ins2mm
.The
TDATA
inmm2s
shows the data that is being read into the AI Engine module. To correlate with the number of iterations (seven), you specified in the graph, observe theTREADY
signal which goes high when the AI Engine module is ready to read, and theTVALID
signal goes high for all the Read transactions.Similarly you can notice in
s2mm_1
theTVALID
is high indicating the valid data and theTLAST
goes high at the end of every iteration, goes low during start of next iteration.This way you should be able to identify whether a data is being sent/received, to/from the AI Engine module correctly or not.
Section 3
Performance of the AI Engine Using the Hardware Emulation Results
This section walks you through profiling the AI Engine as part of running the hardware emulation and calculate the throughput of the design considering the system as a whole, i.e., when the MM2S module is transferring data to the AI Engine, the AI Engine computes the output and transferrs the data to the S2MM module. Also note, in this case, the PS is controlling both the PL and AI Engine. Compare the throughput of the design with the AI Engine as a standalone module(aiesimulation results).
In the Vitis IDE, go to Flow Navigator -> [system_project] component -> Vitis -> Start Emulation.
Add the
-aie-sim-options {PROJECT_PATH}/aie_component/build/hw/aiesimulator_output/aiesim_options.txt
in the Emulator Arguments option, and click Start.Now the hardware emulation launches and starts the QEMU emulation environment. The Emulation console shows a transcript of the QEMU launch and Linux boot process.
Once the boot completes in the Vitis IDE, run the application using Run on the system project.
This runs the application and shows TEST PASSED in the output console.
Duble-click the System_Project -> HARDWARE EMULATION -> Reports -> Summary file. This opens the summary file
{PROJECT_PATH}/system_project/build/hw_emu/system_project_hw_emu/xrt.run_summary
path in the Vitis Analyzer.As you observe, it carry forwards the aiesimulator options specified in
aiesimulator_output/aiesim_options
and provides the results.
Calculating the Kernel Latency
From the Profile information in the Vitis Analyzer, analyze the function time of the kernels as explained in Section 9 in the AIE simulation.
For example, compare the function time of the data_shuffle
kernel with the standalone AIE simulation result, and calculate the kernel latency.
From the trace information, you can calculate the kernel latency as follows:
Click the
Trace
in the AI Engine simulation run summary, and navigate to the any function to calculate the latency. For example, consider thedata_shuffle
function.You can notice the function
data_shuffle
ran for seven iterations. Zoom into the period of one iteration (between twomain()
function calls as follows), add a marker, and drag it to the end of the kernel function as follows:Notice the difference of
263.2 ns
as highlighted above. This is the time the kernel took to complete one iteration.If you click the AI Engine Simulation Summary, you can notice the AI Engine Frequency as 1250 MHz, i.e., 0.8 ns, i.e., one cycle = 0.8 ns. Now, the data_shuffle function took 263.2 ns for one iteration, i.e., 263.2 / 0.8 ~= 329 cycles.
Compare this with the latency you got during the
aiesimulation
where the AI Engine is a standalone module; see Section-9 in AIE Simulation.
Calculating the Graph Throughput Using the Graph Output
Ensure the Enable Trace
is checked in Run settings (Flow Navigator -> aie_component). If it is not checked, you must check and select the Run
.
Steps to run the Trace from Vitis IDE, Go to Flow Navigator -> Select AI Engine Component -> Select AIE Simulator/ Hardware -> Go to Reports -> Select Trace.
From the trace information in the run_summary in the Vitis Analyzer, navigate to the output port for which you want to calculate the throughput (Upscale kernel in this case). Add a marker at the start of the first output sample as highlighted below. Then click the Go button to the last time icon, and observe the cursor moves to the end of the last iteration. Now, click the previous transition icon to go the start of the last iteration. Add one more marker at the end, and observe the time difference as
2254.4 ns
.