Lock Stall Analysis - 2024.1 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2024-06-27
Version
2024.1 English

The Performance Metrics analysis tab can help identify if the lock stall needs to be analyzed. The following steps illustrate how a lock stall can be analyzed starting in the Vitis IDE.

  1. Select the Trace view.
  2. Select the Lock Stalls view.
    Tip: The Stalls view is available with the Trace view, Graph view, and Array view.
    Figure 1. Lock Stall in Trace View
    Each stall has the following information associated with it.
    NAME
    The lock stall is named LS_<NUM>. The number is unique across all types of stalls. The earlier the stall happens, the smaller the number.
    Stalled Tile
    The AI Engine tile where the stalled kernel is located.
    Stalled Kernel
    The kernel that is stalled. It is named <Kernel_function_name>.<Schedule_ID>.<Graph_instance_name>. Sometimes it is shown as _main and then cross-probe is required to find the real kernel function.
    Start (ps)
    The start time of the stall.
    Duration (ps)
    The duration of the stall.
    PC
    Program counter when the stall happens.
    Type
    The stalled kernel tries to read or write the buffer.
    Buffer
    The buffer that the stalled kernel tries to read or write.
    Stalled Port
    The port of the stalled kernel that tries to read or write the buffer.
    Lock Holder
    The source that is holding the lock of the buffer.
    Related Stall
    Other stalls that can cause a stall.
    Tip: The items in green can be cross-probed with other views.
  3. Select one row of the stall. It will go to the start of the stall in Trace view. It is optional to filter all the signals that are related to the stall by right-clicking the stall and choosing Filter Trace. In Trace view, the signals related to the stall are shown. Non-related signals are hidden. Exploring the trace is easier when the design is large.
    Tip: Filter Trace may not show signals related to the related stalls.
  4. Trace view can be used to view the lock stall in timeline. For a specific lock stall, it can be seen when the write lock and read lock are allocated. From the position of the stall and events before and after the stall, the reason for the stall can be analyzed. For example, if write lock has already been allocated and the lock type is Read, then it indicates that the buffer has not been released by the producer. The consumer is waiting for it to be readable. The producer can be found in the Lock Holder.
  5. To clear the previously filtered trace, right-click and choose Clear All Filters.
  6. It is usually helpful to have an overview of the stall path in Graph view. Select Graph view and then select Tile View from the drop-down list in the view. Tile view of Graph view shows graph in AI Engine tiles.
  7. If the Stalls view is not shown, select Lock Stalls from the drop-down list and choose the stall to be analyzed. It will highlight the related paths in the Tile view of the Graph view.
  8. The red path shows where the stall occurs. The white path shows the source to where the stall occurs.
  9. Click the PC value. This opens the source code and goes to the line where the stall happens.

The following table lists scenarios that can cause a lock stall and possible solutions.

Table 1. Lock Stall Scenarios and Solutions
Source Target Destination Stall Type Possible Solution
AI Engine kernel Lock of sync window AI Engine kernel Lock stall
  • If single buffer is used, use PING-PONG buffer (default) or place kernels into same AI Engine tile.
  • If the kernels are unbalanced in execution time, balance throughput between kernels.
AI Engine kernel Lock of async window (window_acquire and window_release API) AI Engine kernel Lock stall
  • If a single buffer is used, use a PING-PONG buffer (default).
  • Acquire and release buffer in-time. Use a local buffer as needed.
PL interface Lock of window AI Engine kernel Lock stall
  • Ensure that the PL interface throughput matches the AI Engine throughput.
  • Check that the PL interface frequency and width are set properly. See AI Engine/Programmable Logic Integration in AI Engine Kernel and Graph Programming Guide (UG1079).
AI Engine Lock of window PL interface Lock stall
  • Ensure that the PL interface throughput matches AI Engine throughput.
  • Check that the PL interface frequency and width are set properly. See AI Engine/Programmable Logic Integration in AI Engine Kernel and Graph Programming Guide (UG1079).
Note: DMA lock stall is not included in the Vitis IDE lock stall analysis.