DMA can be configured in AI Engine memory modules or AI Engine-ML memory tiles to transfer data to and from streams. The DMA write or read operation requires the lock of the write or read buffer to be acquired before the operation starts. If the lock of the DMA buffer is not acquired, the DMA operation will be stalled.
Following picture shows a DMA Stalls view. The DMA stalls can be cross-probed across graph, array and trace views.
Figure 1. DMA Stalls View
Each DMA stall has the following information:
- NAME
- The stream stall is named
DMALS_<NUM>
. The earlier the stall happens, the smaller the number. - TILE
- The DMA location.
- DMA CHANNEL
- The DMA channel.
- STALLED INSTANCE
- The stalled instance.
- STALLED PORT
- The port where the stall happens.
- START (PS)
- The start time that the stall happens.
- DURATION (PS)
- The duration of the stall.
- RELATED STALLS
- Other stalls that might cause this stall, or the stalls this stall could cause.
- BUFFER
- The buffer related to the DMA operation.
- LOCK HOLDER
- The source that is holding the lock of the buffer.
The following table lists some possible DMA stall scenarios and solutions:
Source | DMA Channel | Destination | Stalled Instance | Possible Cause or Solution |
---|---|---|---|---|
PL Interface | DMA Write | Tile Buffer | PL Interface | Tile buffers are full. Improve kernel performance. |
PL Interface | DMA Write | Memory Tile | PL Interface | Memory tile reading is not completed. Use Ping-Pong buffers of memory tile. |
Tile Buffer | DMA Read | PL Interface | PL Interface | Tile buffer is not ready. If it occurs in the very beginning, it's ok. If it occurs between kernel executions, try to improve kernel performance. |
Memory Tile | DMA Read | Tile Buffer | Kernel | Data transfer is slow, or multiple read ports of memory tile affect each other. Try to improve data transfer speed comparing to kernel execution. |