Memory Stall Analysis - 2022.1 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2022-05-25
Version
2022.1 English
AI Engine can perform several vector load or store operations per cycle. However, for the load or store operations to be executed in parallel, they must target different memory banks. Memory stall happens when multiple access on the same bank of memory on the same cycle.

The kinds of memory include window buffer and DMA FIFO between kernels, RTP buffer, and system memory. System memory includes kernel synchronization information in the first 32 bytes, stack, and heap. Static variables are in the heap and function control logics are in the stack. System memory occupies continuous memory banks. Tool can automatically or manually place window buffer, RTP buffer, DMA FIFO, and system memory on specific banks. To alleviate memory stalls between these memories, try to place them into separate banks if possible. But memory stall can still happen between these kinds of memories if separate banks cannot be found for all the memories, or multiple accesses are happening on the same memory.

In general, the compiler tries to schedule many memory accesses in the same cycle when possible, but there are some exceptions. Memory accesses coming from the same pointer are scheduled on different cycles. If the compiler schedules the operations on multiple variables or pointers in the same cycle, memory bank conflicts can occur. Each memory bank has its arbitrator to arbitrate between all requests, and the arbitration is round-robin. After every request has been addressed, the memory stall is released.

From Performance Metrics analysis, you can identify if the memory stall needs to be analyzed.

  1. Select the Trace view.
  2. Select the Stalls view in the bottom and select Memory Stalls from the drop-down list.
    Figure 1. Memory Stall in the Trace View

    The stall is named as MS_<NUM>. The number is increased by time. Each stall has the following associated information.

    Stall ID
    The memory stall id. The earlier the stall happens, the smaller the number. The number is unique across all types of stall.
    Stalled Tile
    The AI Engine tile where the stalled kernel is located.
    Stalled Kernel
    The kernel that is stalled. It is named <Kernel_function_name>.<Schedule_ID>.<Graph_instance_name>. Sometimes it is shown as _main and then cross-probe is required to find the real kernel function.
    Start (ns)
    The start time that the stall happens
    Duration (ns)
    The duration of the stall.
    PC
    Program counter when the stall happens.
    Bank Conflict
    The memory bank where the stall happens on.
    Buffer 1, Buffer 2, Buffer 3
    The buffers that cause the memory stall. It can be one buffer or multiple buffers.
  3. When you click each line of the stalls in the Stalls view, it goes to the start of the memory stall in the Trace view. Zoom in and out of the Trace view to observe how frequently the memory stalls occur and the position of the stall in kernel running.
    Note: If large number of memory stalls occur repeatedly in the running kernel, it indicates that the stalls can happen inside loop. It is best to investigate and resolve. If memory stalls only happen once at the start of the kernel running, or a very small number of stalls happen in kernel running, it can usually be neglected. From the name of the buffers that cause the stall, it can be identified whether it is window buffer or system buffer or something else. If it is a window buffer or RTP buffer that can be controlled in the graph, one way is to place it manually using constraints if better placement can be identified. If it is system memory (named system<NUM>+<NUM>), it is required to identify the variables that are involved in the stall.
  4. Click the row of the specific stall and switch to the Events view.
    Figure 2. Events View of Memory Stall

  5. The Events view shows the events that happen in the device. The cycle where the memory stall happens is highlighted. You can see the tile on which the DM_BANK_CONFLICT event has occurred and the variables that are being read or written. In the previous figure, the variables delay_line and eq_coef1 are being read at the same cycle.
  6. Try to explore some cycles before or after the stall cycle to find more hints. For example, the following figure shows the events happened after the previous stall cycle. It is seen that delay_line and eq_coef1 are also being read at the same cycle but different part (128 bits). By examining the source code and assembly code, it can be found that delay_line and eq_coef1 are both issued 256 bits at the same cycle, and that causes the memory stall. The two 256 bits memory access are split into two cycles due to the memory stall.
    Figure 3. Events View of Memory Stall