Development of a user application and hardware kernels targeting an FPGA requires a phased development approach. Because FPGA, AMD Versal™ adaptive SoC, and AMD Zynq™ UltraScale+™ MPSoC are programmable devices, building the device binary for hardware takes some time. To enable quicker iterations without having to go through the full hardware compilation flow, the AMD Vitis™ tool provides software emulation targets to perform C-based simulation of the design, and hardware emulation targets to perform C-RTL co-simulation of the software application and PL kernels. Compiling for emulation targets is significantly faster than compiling for the actual hardware. Additionally, emulation targets provide full visibility into the application or accelerator, thus making it easier to perform debugging. Once your design passes in emulation, then in the late stages of development you can compile and run the application on the hardware platform.
The Vitis tool provides two emulation targets:
- Software emulation (sw_emu)
- The software emulation build compiles and links quickly, and the host program runs either natively on an x86 processor or in the QEMU emulation environment. The PL kernels are natively compiled and running on the host machine. This build target lets you quickly iterate on both the host code and kernel logic.
- Hardware emulation (hw_emu)
- The host program runs in
sw_emu
, natively on x86 or in the QEMU, but the kernel code is compiled into an RTL behavioral model which is run in the AMD Vivado™ simulator or other supported third-party simulators. This build and run loop takes longer but provides a cycle-accurate view of kernel logic.
Compiling and linking for either of the emulation targets is seamlessly integrated into the Vitis command line and IDE flows. You can compile your host and kernel source code for either emulation target, without making any change to the source code. For your host code, you do not need to compile differently for emulation as the same host executable or PS application ELF binary can be used in emulation. Emulation targets support most of the features including XRT APIs, buffer transfer, platform memory SP tags, kernel-to-kernel connections, etc. The following sections detail the features and requirements of both the software and hardware emulation flows.
While running emulation you can specify a number of trace options as described in Enabling Profiling in Your Application to capture design data during runtime. Any reports generated during the run are collected into the xrt.run_summary file. This collection of reports can be viewed by opening the run_summary in Vitis analyzer, and includes a Summary report, System and Platform Diagrams to illustrate the hardware design, Run Guidance offering any suggestions for improving the performance of the system, and a Profile Summary and Timeline Trace when enabled in the xrt.ini file during runtime. Refer to Using the Vitis Analyzer for additional information.
SW Emulation is an abstract model and does not use any of the petalinux drivers like such as Zynq OpenCL (ZOCL), interrupt controller, or Device Tree Binary (DTB). Hence, the overhead of creating sd_card.img and booting petalinux on full QEMU machines can be avoided for SW Emulation. This enables faster SW_EMU as QEMU is slow and requires petalinux. Thus, for this approach the user is not required to provide fields such as sysroot, rootfs and sd_Card Image.
Installing the x86 XRT automatically sets the LD_LIBARY_PATH
variable to point to XRT libraries. For running both the embedded XRT and x86
XRT on the same setup (terminal), you must specify arm-gcc
and SYSROOT paths for embedded systems.
- Running Emulation Targets
- Data Center vs. Embedded Platforms
- QEMU
- Running Emulation on Data Center Accelerator Cards
- Running Emulation on an Embedded Processor Platform
- Speed and Accuracy of Hardware Emulation
- Working with Simulators in Hardware Emulation
- Working with Functional Model of the HLS Kernel
- Working with SystemC Models
- Debug Techniques in Hardware Emulation
- Working with I/O Traffic Generators
Limitations of the Software Emulation Flow
The following are not supported in sw_emu
but supported by
hw_emu
and hw
targets
- hls_stream<ap_uint> and hls_stream<ap_int>
-
ap_uint
andap_int
are supported as primitive datatypes, but not as anhls_stream
datatype. - Array or Vector of hls_streams
- If the HLS Kernel written is expecting to read and write the data to an
array of
hls_stream
,sw_emu
is not elaborating the array to N streams and processing it. - Runtime Data generation based on memory connectivity
- HBM memory size for each bank is limited to 256MB; proper RTD generation based on memory connectivity is not supported.
- Kernels with RTL
- Only HLS and AIE-1 Kernels are supported.
- AIE-2 Ex buf descriptors
- Not supported.
- PL Controller based designs
- Not supported.
- Presynthi PDI is mandatory for sw_emu
- PS QEMU designs will not work if base platforms does not have Presynth PDI.