After packaging, everything is set to run emulation. Since you ran aiesimulator with profiling enabled, you can bring that to hardware emulation. You can pass the aiesim_options.txt to the launch_hw_emu.sh which will enable the profiling options used in aiesimulator to be applied to hardware emulation. To do this, add the -aie-sim-options ../aiesimulator_output/aiesim_options.txt.
Since Profiling is deprecated in Hardware Emulation Flow, comment the line ‘AIE_PROFILE=All’ in aiesimulator_output/aiesim_options.txt
To run emulation use the following command:
make run_emu TARGET=hw_emu
or
cd ./sw ./launch_hw_emu.sh -aie-sim-options ../aiesimulator_output/aiesim_options.txt -add-env AIE_COMPILER_WORKDIR=../Work
When launched, use the Linux prompt presented to run the design. Note that the emulation process is slow, so do not touch the keyboard of your terminal or you might stop the emulation of the Versal booth (as it happens in the real HW board).
Execute the following command when the emulated Linux prompt appears:
cd /run/media/*1 export XILINX_XRT=/usr dmesg -n 4 && echo "Hide DRM messages..."
This will set up the design to run emulation and remove any unnecessary DRM messaging.
Run the design using the following command:
./host.exe a.xclbinNote: The design runs with dumping VCD, which will extend emulation time. It may seem as if it is hung, but it is not.
You should see an output displaying TEST PASSED. When this is shown, run the keyboard command:
Ctrl+A xto end the QEMU instance.To view the profiling results and trace in Vitis Analyzer, run the command:
vitis_analyzer -a sw/sim/behav_waveform/xsim/default.aierun_summary
When you open the run Summary, you will notice that it is the same layout as that from
aiesimulator.Click Trace. This will open up the VCD data (as defined in the
aiesim_options.txt). This gives detailed information about kernels, tiles, and nets within the AI Engine during execution. Here you can see stalls in regards to each kernel and can help you identify where they are originating.
From the trace information, you can calculate the kernel latency as follows:
Click the
Tracein the AI Engine simulation run summary, and navigate to the any function to calculate the latency. For example, consider theclassifierfunction.You can notice the function
classifierran for seven iterations. Zoom into the period of one iteration (between two main() function calls as follows), add a marker, and drag it to the end of the kernel function as follows:
Notice the difference of 25.093 us as highlighted above. This is the time the kernel took to complete one iteration.
If you click the AI Engine Simulation Summary, you can notice the AI Engine Frequency as 1250 MHz, i.e., 0.8 ns, i.e., one cycle = 0.8 ns. Now, the classifier function took 25.093 us for one iteration, i.e., 25.093 us / 0.8 ns ~= 31298 cycles.
Compare this with the latency you got during the aiesimulation where the AI Engine is a standalone module;
Explore the two reports and take note of any differences and similarities. This will help you debug and optimize your design.
Close out of the Vitis Analyzer and build for hardware.