Host Application Hangs When Accessing Memory - 2023.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2023-07-17
Version
2023.1 English
Application hangs can also be caused by incomplete DMA transfers initiated from the host code. This does not necessarily mean that the host code is wrong; it might also be that the kernels have issued illegal transactions and locked up the AXI.
  1. If the platform has an AXI firewall, such as in the Vitis target platforms, it is likely to trip. The driver issues a SIGBUS error, kills the application, and resets the device. You can check this by running the following command:
    xbutil examine -d <bdf> -r firewall

    The following figure shows such an error in the firewall status:

    Firewall Last Error Status:
    		0:		0x0	 (GOOD)
    		1:		0x0	 (GOOD)
    		2:		0x80000 (RECS_WRITE_TO_BVALID_MAX_WAIT). 
    				  Error occurred on Tue 2017-12-19 11:39:13 PST
    
    Xclbin ID:	0x5a39da87
    Tip: If the firewall has not tripped, the Linux tool, dmesg, can provide additional insight.
  2. When you know that the firewall has tripped, it is important to determine the cause of the DMA timeout. The issue could be an illegal DMA transfer, or kernel misbehavior. However, a side effect of the AXI firewall tripping is that the health check functionality in the driver resets the board after killing the application; any information on the device that might help with debugging the root cause is lost. To debug this issue, disable the health check thread in the xclmgmt kernel module to capture the error. This uses common Unix kernel tools in the following sequence:
    1. sudo modinfo xclmgmt: This command lists the current configuration of the module and indicates if the health_check parameter is ON or OFF. It also returns the path to the xclmgmt module.
    2. sudo rmmod xclmgmt: This removes and disables the xclmgmt kernel module.
    3. sudo insmod <path to module>/xclmgmt.ko health_check=0: This re-installs the xclmgmt kernel module with the health check disabled.
      Tip: The path to this module is reported in the output of the call to modinfo.
  3. With the health check disabled, rerun the application. You can use the kernel instrumentation to isolate this issue as previously described.