Memory Stall Analysis - 2025.2 English - UG1076

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2025-11-20
Version
2025.2 English
AI Engine can perform several vector load or store operations per cycle. However, for the load or store operations to be executed in parallel, they must target different memory banks. Memory stall happens when multiple access on the same bank of memory on the same cycle.

The types of memory include window buffer and DMA FIFO between kernels, RTP buffer, and system memory. System memory includes kernel synchronization information in the first 32 bytes, stack, and heap. Static variables are in the heap and function control logics are in the stack. System memory occupies continuous memory banks.

The tool can automatically or manually place window buffer, RTP buffer, DMA FIFO, and system memory on specific banks. To alleviate memory stalls between these memories, place them into separate banks if possible. Memory stall can still occur between these types of memories. This can happen if separate banks cannot be found for all the memories, or if multiple accesses occur on the same memory.

In general, the compiler tries to schedule many memory accesses in the same cycle when possible, but there are some exceptions. Memory accesses coming from the same pointer are scheduled on different cycles. If the compiler schedules the operations on multiple variables or pointers in the same cycle, memory bank conflicts can occur. Each memory bank has its arbitrator to arbitrate between all requests, and the arbitration is round-robin. The memory stall releases after every request is addressed.

From Performance Metrics analysis, you can identify if the memory stall needs to be analyzed:

  1. Select the Trace view.
  2. Select the Memory Stalls table.
    Figure 1. Memory Stall in the Trace View

    The name of the stall is MS_<NUM>. The number is increments by time. Each stall has the following associated information:

    NAME
    The memory stall id. The earlier the stall happens, the smaller the number. The number is unique across all types of stall.
    Stalled Tile
    The AI Engine tile that contains the stalled kernel.
    Stalled Kernel
    The kernel that is stalled. It is named <Kernel_function_name>.<Schedule_ID>.<Graph_instance_name>. If it displays as _main, you need to cross-probe to find the real kernel function.
    Start (ps)
    The start time that the stall happens
    Duration (ps)
    The duration of the stall.
    PC
    Program counter when the stall happens.
    Bank Conflict
    The memory bank where the stall happens on.
    Buffer 1, Buffer 2, Buffer 3
    The buffers that cause the memory stall. It can be one buffer or multiple buffers.
  3. When you click each line of the stalls in the Stalls view, it goes to the start of the memory stall in the Trace view. Zoom in and out of the Trace view to observe how frequently the memory stalls occur and the position of the stall in kernel running.
    Note: If a large number of memory stalls occur repeatedly in the running kernel, it indicates that the stalls can happen inside a loop. It is best to investigate and resolve. You can usually ignore memory stalls that occur one time at the start of kernel running. You can also ignore if a very small number of stalls happen in kernel running.

    You can use the name of the buffers that causes the stall to identify whether it is window buffer or system buffer or something else.

    If the buffer is a window or RTP type, you can control it in the graph. If you identify better placement, you can place it manually using constraints.

    If the memory is system memory (named system<NUM>+<NUM>), you must identify the variables involved in the stall.

  4. Click the row of the specific stall and switch to the Events view.
    Figure 2. Events View of Memory Stall
  5. The Events view shows the events that happen in the device. The cycle where the memory stall happens is highlighted. The highlighted memory stall shows where the DM_BANK_CONFLICT event has occurred and the data that is being read or written.
  6. Try to explore some cycles before or after the stall cycle to find more hints.

    In some cases, the tool schedules to read or write variables at the same bank in a cycle. See the Load and Store with Virtual Resource Annotations section in AI Engine Kernel and Graph Programming Guide (UG1079) for information on resolving this issue. For example, redefine points to the variables and annotate them with __aie_dm_resource_a. Following is an example code:

    const v8cint16 __aie_dm_resource_a* __restrict coeff = (v8cint16 __aie_dm_resource_a*) eq_coef0; const v8cint16 coe = *coeff;
    v16cint16 __aie_dm_resource_a* __restrict p_buff = (v16cint16 __aie_dm_resource_a*) &delay_line; v16cint16 buff=*p_buff;

The following table lists some possible scenarios that cause memory stalls and possible solutions.

Table 1. Memory Stall Scenarios and Solutions
Source Target Stall Type Possible Solution Note
Single kernel Buffers on a single memory bank Memory stall
  • Dispatch buffers to different banks (buffers include system memory, RTP, window buffer, DMA, and FIFO). See Memory Stalls.
  • Guide compiler scheduling with virtual memory annotations. See Load and Store with Virtual Resource Annotations section in AI Engine Kernel and Graph Programming Guide (UG1079)

Single kernel accesses buffers on the same bank.

Or a single kernel has multiple accesses on one buffer on the same bank.

(a cycle can have two loads and one store)

Multiple kernels on adjacent AI Engine tiles Multiple buffers on one bank Memory stall
  • Dispatch buffers to different banks (memories include system memory, RTP, window buffer, DMA, and FIFO).
  • BufferOptLevel.
  • If memory banks are exhausted, do profile and AI Engine stall analysis to find better solution with less kernel execution time or less stall percentage.
Multiple kernels accessing multiple buffers on the same bank.