Application hangs can also be
caused by incomplete DMA transfers initiated from the host code. This does not
necessarily mean that the host code is wrong; it might also be that the kernels have
issued illegal transactions and locked up the AXI.
- If the platform has an AXI firewall, such as in the Vitis target platforms, it is likely to trip.
The driver issues a
SIGBUSerror, kills the application, and resets the device. You can check this by running the following command:xbutil examine -d <bdf> -r firewallThe following figure shows such an error in the firewall status:
Firewall Last Error Status: 0: 0x0 (GOOD) 1: 0x0 (GOOD) 2: 0x80000 (RECS_WRITE_TO_BVALID_MAX_WAIT). Error occurred on Tue 2017-12-19 11:39:13 PST Xclbin ID: 0x5a39da87Tip: If the firewall has not tripped, the Linux tool,dmesg, can provide additional insight. - When you know that the firewall has tripped, it is important to
determine the cause of the DMA timeout. The issue could be an illegal DMA
transfer, or kernel misbehavior. However, a side effect of the AXI firewall
tripping is that the health check functionality in the driver resets the board
after killing the application; any information on the device that might help
with debugging the root cause is lost. To debug this issue, disable the health
check thread in the
xclmgmtkernel module to capture the error. This uses common Unix kernel tools in the following sequence:-
sudo modinfo xclmgmt: This command lists the current configuration of the module and indicates if thehealth_checkparameter is ON or OFF. It also returns the path to thexclmgmtmodule. -
sudo rmmod xclmgmt: This removes and disables thexclmgmtkernel module. -
sudo insmod <path to module>/xclmgmt.ko health_check=0: This re-installs thexclmgmtkernel module with the health check disabled.Tip: The path to this module is reported in the output of the call tomodinfo.
-
- With the health check disabled, rerun the application. You can use the kernel instrumentation to isolate this issue as previously described.