Kernel Hangs Due to AXI Violations - 2023.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2023-07-17
Version
2023.1 English
It is possible for the kernels to hang due to bad AXI transactions between the kernels and the memory controller. To debug these issues, it is required to instrument the kernels.
  1. The Vitis core development kit provides two options for instrumentation to be applied during v++ linking (--link). Both of these options add hardware to your implementation, and based on resource utilization it might be necessary to limit instrumentation.
    1. Add Lightweight AXI Protocol Checkers (lapc). These protocol checkers are added using the -–debug.protocol option, as explained in --debug Options. The following syntax is used:
      --debug.protocol <compute_unit_name>:<interface_name>
      In general, the <interface_name> is optional. If not specified, all ports on the CU are expected to be analyzed. The --debug.protocol option is used to define the protocol checkers to be inserted. This option can accept a special keyword, all, for <compute_unit_name> and/or <interface_name>.
      Note: Multiple --debug.xxx options can be specified in a single command line, or configuration file.
    2. Adding Performance Monitors (am, aim, asm) enables the listing of detailed communication statistics (counters). Although this is most useful for performance analysis, it provides insight during debugging on pending port activities. The Performance Monitors are added using the --profile option as described in --profile Options. The basic syntax for the --profile option is:
      --profile.data <krnl_name>|all:<cu_name>|all:<intrfc_name>|all:<counters>|all
      Three fields are required to determine the specific interface to attach the performance monitor to. However, if resource consumption is not an issue, the keyword all lets you apply the monitoring to all existing kernels, compute units, and interfaces with a single option. Otherwise, you can specify the kernel_name, cu_name, and interface_name explicitly to limit instrumentation.
      The last option, <counters>|all, allows you to restrict the information gathering to just counters for large designs, while all (default) includes the collection of actual trace information.
      Note: Multiple --profile options can be specified in a single command line, or configuration file.
      [profile]
      dataernel1:cu1:m_axi_gmem0 
      dataernel1:cu1:m_axi_gmem1 
      dataernel2:cu2:m_axi_gmem
      
  2. When the application is rebuilt, rerun the host application using the xclbin with the added AIM IP and LAPC IP.
  3. When the application hangs, you can use xbutil examine to check for any errors or anomalies.
  4. Check the AIM output:
    • Run the following command a couple of times to check if any counters are moving. If they are moving then the kernels are active.
      xbutil examine -d <bdf> -r debug-ip-status -e aim
      Tip: Testing AIM output is also supported through GDB debugging using the command extension xstatus aim.
    • If the counters are stagnant, the outstanding counts greater than zero might mean some AXI transactions are hung.
  5. Check the LAPC output:
    • Run the following command to check if there are any AXI violations.
      xbutil examine -d <bdf> -r debug-ip-status -e lapc
      Tip: Testing LAPC output is also supported through GDB debugging using the command extension xstatus lapc.
    • If there are any AXI violations, it implies that there are issues in the kernel implementation.