DMA Stall Analysis - 2024.1 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2024-06-27
Version
2024.1 English

DMA can be configured in AI Engine memory modules or AI Engine-ML memory tiles to transfer data to and from streams. The DMA write or read operation requires the lock of the write or read buffer to be acquired before the operation starts. If the lock of the DMA buffer is not acquired, the DMA operation will be stalled.

Following picture shows a DMA Stalls view. The DMA stalls can be cross-probed across graph, array and trace views.

Figure 1. DMA Stalls View

Each DMA stall has the following information:

NAME
The stream stall is named DMALS_<NUM>. The earlier the stall happens, the smaller the number.
TILE
The DMA location.
DMA CHANNEL
The DMA channel.
STALLED INSTANCE
The stalled instance.
STALLED PORT
The port where the stall happens.
START (PS)
The start time that the stall happens.
DURATION (PS)
The duration of the stall.
RELATED STALLS
Other stalls that might cause this stall, or the stalls this stall could cause.
BUFFER
The buffer related to the DMA operation.
LOCK HOLDER
The source that is holding the lock of the buffer.

The following table lists some possible DMA stall scenarios and solutions:

Table 1. DMA Stall Scenarios and Solutions
Source DMA Channel Destination Stalled Instance Possible Cause or Solution
PL Interface DMA Write Tile Buffer PL Interface Tile buffers are full. Improve kernel performance.
PL Interface DMA Write Memory Tile PL Interface Memory tile reading is not completed. Use Ping-Pong buffers of memory tile.
Tile Buffer DMA Read PL Interface PL Interface Tile buffer is not ready. If it occurs in the very beginning, it's ok. If it occurs between kernel executions, try to improve kernel performance.
Memory Tile DMA Read Tile Buffer Kernel Data transfer is slow, or multiple read ports of memory tile affect each other. Try to improve data transfer speed comparing to kernel execution.