During hardware execution, the actual hardware platform is used to execute the kernels, and you can evaluate the performance of the host program and accelerated kernels just by running the application. However, debugging the hardware build requires additional logic to be incorporated into the application. This will impact both the FPGA resources consumed by the kernel and the performance of the kernel running in hardware. The debug configuration of the hardware build includes special ChipScope debug cores, such as Integrated Logic Analyzer (ILA) and Virtual Input/Output (VIO) cores, and AXI performance monitors for debug purposes.
The following figure shows the debug process for the hardware build, including debugging the host code using GDB, and using the Vivado hardware manager, with waveform analysis, kernel activity reports, and memory access analysis to identify and localize hardware issues.
With the system hardware build configured for debugging, the host program running on the CPU and the Vitis accelerated kernels running on the AMD device can be confirmed to be executing correctly on the actual hardware of the target platform. Some of the conditions that can be identified and analyzed include the following:
- System hangs caused by protocol violations:
- These violations can take down the entire system.
- These violations can cause the kernel to get invalid data or to hang.
- It is hard to determine where or when these violations originated.
- To debug this condition, you should use an ILA triggered off of the AXI protocol checker, which needs to be configured on the Vitis target platform.
- Problems with the hardware kernel:
- Problems sometimes caused by the implementation: timing issues, race conditions, and bad design constraints.
- Functional bugs that hardware emulation does not reveal.
- Performance issues:
- For example, the frames per second processing is not what you expect.
- You can examine data beats and pipelining.
- Using an ILA with trigger sequencer, you can examine the burst size, pipelining, and data width to locate the bottleneck.