DMA can be configured in AI Engine memory modules or AI Engine-ML memory tiles to transfer data to and from streams. The DMA write or read operation requires the lock of the write or read buffer to be acquired before the operation starts. If the lock of the DMA buffer is not acquired, the DMA operation stalls.
The following figure shows a DMA Stalls view. The DMA stalls can be cross-probed across graph, array and trace views.
Figure 1. DMA Stalls View
Each DMA stall has the following information:
- NAME
- The name of the stream stall is
DMALS_<NUM>. The earlier the stall happens, the smaller the number. - TILE
- The DMA location.
- DMA CHANNEL
- The DMA channel.
- STALLED INSTANCE
- The stalled instance.
- STALLED PORT
- The port where the stall happens.
- START (PS)
- The start time that the stall happens.
- DURATION (PS)
- The duration of the stall.
- RELATED STALLS
- Other stalls that can cause this stall, or the stalls this stall can cause.
- BUFFER
- The buffer related to the DMA operation.
- LOCK HOLDER
- The source that is holding the lock of the buffer.
The following table lists some possible DMA stall scenarios and solutions.
| Source | DMA Channel | Destination | Stalled Instance | Possible Cause or Solution |
|---|---|---|---|---|
| PL Interface | DMA Write | Tile Buffer | PL Interface | Tile buffers are full. Improve kernel performance. |
| PL Interface | DMA Write | Memory Tile | PL Interface | Memory tile reading is not complete. Use Ping-Pong buffers of memory tile. |
| Tile Buffer | DMA Read | PL Interface | PL Interface | Tile buffer is not ready. If the stall occurs in the very beginning, it's ok. If the stall occurs between kernel executions, try to improve kernel performance. |
| Memory Tile | DMA Read | Tile Buffer | Kernel | Data transfer is slow, or multiple read ports of memory tile affect each other. Try to improve data transfer speed comparing to kernel execution. |