Building the System in HW Emulation - 2025.2 English - UG1701

Embedded Design Development Using Vitis User Guide (UG1701)

Document ID
UG1701
Release Date
2025-11-20
Version
2025.2 English

To simulate the entire system, including AI Engine graph and PL logic along with XRT-based host application to control the AI Engine and PL, for a specific board and platform, you must use the Vitis hardware emulation flow. This flow includes the SystemC model of the AI Engine, transaction-level SystemC models for the NoC, DDR memory, PL Kernels (RTL), and the PS (running on QEMU).

Building the system involves building the device binary for HW Emulation target including AI Engine graph and the PL kernel and building the XRT based PS application. For details on building the system in HW Emulation, see Building and Running the System.

The Vitis tool provides two distinct end-to-end flows for embedded systems: hardware emulation (hw_emu) and hardware (hw). These flows are intentionally separate because they target different execution models, generate different artifacts, and serve different validation goals.

What the Linker and Packager Builds

See the following for the tasks hw_emu performs.

  • v++ --link -t hw_emu builds a co-simulation model that integrates the following.
    • Processing System (PS) running functionally in QEMU.
    • AI Engine (AIE) modeled in SystemC (cycle-approximate).
    • Programmable Logic (PL) kernels compiled to RTL and simulated in an RTL simulator (for example, XSIM).
    • Transaction-level models for NoC, memory, and other data paths where applicable.
  • The linker still produces an .xclbin that XRT consumes. Packaging (v++ -p -t hw_emu) generates emulation artifacts and scripts (for example, launch_hw_emu.sh). Packaging also generates emulation-oriented boot components (rootfs, device tree) required to drive QEMU and the simulators.

hw:

  • v++ --link -t hw generates a deployable .xclbin that corresponds to a fully implemented PL bitstream, AIE binaries, and associated metadata to run on the device.
  • Packaging produces SD/flash images for the board. The images contain real bitstream/PDI, AIE binaries, and application/software stack for execution on silicon.

Execution Model and Fidelity

hw_emu:

  • Runs in a mixed simulation environment. The PS is functionally accurate (QEMU). AIE and interconnect and memory are cycle-approximate SystemC models. The user PL kernels execute as RTL in the simulator (RTL-accurate within the simulated region).
  • This option is not intended to be cycle-accurate at the full-system level. Latency, bandwidth, and contention behavior can differ from hardware.

hw:

  • Runs on real hardware with actual clocks, I/O, DDR/LPDDR behavior, NoC routing, board peripherals, and software stack.
  • Required to validate timing closure, I/O margins, power/thermal behavior, and full-chip interactions.

Debug, visibility, and analysis

hw_emu:

  • Emphasis on visibility and bring-up. These include RTL waveforms, SystemC/TLM traces, AIE graph inspection, and rich XRT profiling/tracing.
  • Supports system-level checks such as deadlock detection workflows and controlled traffic generation.
  • Memory monitoring: supports AXI Interface Monitors (AIMs) on memory ports in counters-only configurations.

hw:

  • Debug uses on-chip capabilities (for example, ILA/ChipScope, XVC) and software debuggers. Visibility is more limited than in emulation.
  • Used to confirm real-world performance, bandwidth under contention, and platform behavior that only appears on silicon.

Turnaround and optimization

hw_emu:

  • Faster build–run cycles because it avoids full synthesis/place/route and board programming.
  • Ideal for functional integration, driver/XRT flow debug, and early performance trend analysis.
  • hw:
    • Longer compile and deployment cycles due to full Vivado implementation and device programming.
    • Essential for QoR assessment (timing/resource), final performance/power, and board-level validation.

Segmented configuration and dynamic reload

hw_emu:

  • Segmented configuration (multi-PDI overlays and dynamic PL reload) is not supported. Attempts to use segmented configuration in hw_emu will result in an error.

hw:

  • Segmented configuration is supported (platform- and version-dependent), enabling dynamic PL reload and multi-PDI use cases. It provides an entry point to DFX-like scenarios without a full Vivado DFX flow and supports isolation/subsystem use cases where PS-to-PL paths remain fixed.

When to use which flow

Choose hw_emu to:

  • Functionally validate PS–AIE–PL integration with high debug visibility.
  • Develop and debug host/driver/AIE/PL interactions and XRT flows.
  • Investigate deadlocks and tune buffering/latencies using approximate system models.
  • Gather early profiling data and performance trends (not final performance).

Choose hw to:

  • Validate timing, resource usage, bandwidth, and performance on the actual device.
  • Exercise board-level I/O and full software stacks under realistic conditions.
  • Use segmented configuration/dynamic PL reload where applicable.
  • Prepare the design for deployment and sign-off.