Hardware emulation uses a mix of SystemC and RTL co-simulation to provide a balance between accuracy and speed of simulation. The SystemC models are comprised of purely functional models and performance approximate models. Hardware emulation does not mimic hardware accuracy 100%, therefore you should expect some differences in behavior between running emulation and executing your application on hardware. This can lead to significant differences in application performance, and sometimes differences in functionality can also be observed.
Functional differences with hardware typically point to a race condition or some unpredictable behavior in your design. So, an issue seen in hardware might not always be reproducible in hardware emulation, though most behavior related to interactions between the host and the kernel, or the kernel and the memory are reproducible in hardware emulation. This makes hardware emulation an excellent tool to debug issues with your kernel prior to running on hardware.
The following table lists models that are used to mimic the hardware platform and their accuracy levels.
| Hardware Functionality | Description |
|---|---|
| AMD UltraScaleâ„¢ DDR Memory, SmartConnect | The SystemC models for the DDR memory controller, AXI SmartConnect, and other data path IPs are usually throughput approximate. They typically do not model the exact latency of the hardware IP. The model can be used to gauge a relative performance trend as you modify your application or the kernel. |
| AI Engine | The AI Engine SystemC model is cycle approximate, though it is not intended to be 100% cycle accurate. A common model is used between AI Engine Simulator and hardware emulation, thus enabling a reasonable comparison between the two stages. |
| AMD Versalâ„¢ NoC and DDR Models | The Versal NoC and DDR SystemC models are cycle approximate. |
| Arm Processing Subsystem (PS, CIPS) | The Arm PS is modeled using QEMU, which is a purely functional execution model. For more information, see QEMU. |
| User Kernel | Hardware emulation uses RTL for the user kernel. As follows, the kernel behavior by itself is 100% accurate. However, the kernel is surrounded by other approximate models. |
| Other I/O Models | For hardware emulation, there is generic Python or C-based traffic generator which can be interfaced with the emulation process. You can generate abstract traffic at AXI protocol level which mimics the I/O in your design. Because these models are abstract, any issues observed on the specific hardware board will not be shown in hardware emulation. |
Because hardware emulation uses RTL co-simulation as its execution model, the
speed of execution is orders of magnitude slower as compared to real hardware. AMD recommends using small data buffers. For example,
if you have a configurable vector addition and in hardware you are performing a 1024
element vadd, in emulation you might restrict it to 16
elements. This will enable you to test your application with the kernel, while still
completing execution in reasonable time.