Application hangs can also be
caused by incomplete DMA transfers initiated from the host code. This does not
necessarily mean that the host code is wrong; it might also be that the kernels have
issued illegal transactions and locked up the AXI.
- If the platform has an AXI firewall, such as in the Vitis target platforms, it is likely to trip.
The driver issues a
SIGBUS
error, kills the application, and resets the device. You can check this by running the following command:xbutil examine -d <bdf> -r firewall
The following figure shows such an error in the firewall status:
Firewall Last Error Status: 0: 0x0 (GOOD) 1: 0x0 (GOOD) 2: 0x80000 (RECS_WRITE_TO_BVALID_MAX_WAIT). Error occurred on Tue 2017-12-19 11:39:13 PST Xclbin ID: 0x5a39da87
Tip: If the firewall has not tripped, the Linux tool,dmesg
, can provide additional insight. - When you know that the firewall has tripped, it is important to
determine the cause of the DMA timeout. The issue could be an illegal DMA
transfer, or kernel misbehavior. However, a side effect of the AXI firewall
tripping is that the health check functionality in the driver resets the board
after killing the application; any information on the device that might help
with debugging the root cause is lost. To debug this issue, disable the health
check thread in the
xclmgmt
kernel module to capture the error. This uses common Unix kernel tools in the following sequence:-
sudo modinfo xclmgmt
: This command lists the current configuration of the module and indicates if thehealth_check
parameter is ON or OFF. It also returns the path to thexclmgmt
module. -
sudo rmmod xclmgmt
: This removes and disables thexclmgmt
kernel module. -
sudo insmod <path to module>/xclmgmt.ko health_check=0
: This re-installs thexclmgmt
kernel module with the health check disabled.Tip: The path to this module is reported in the output of the call tomodinfo
.
-
- With the health check disabled, rerun the application. You can use the kernel instrumentation to isolate this issue as previously described.