AIE_STALL - 2021.2 English

Vitis Guidance Messaging (UG1315)

Document ID
UG1315
Release Date
2021-10-27
Version
2021.2 English

Description

This rule checks the stall percentage of AI engine cores.

Explanation

There are many types of AIE core stalls, including memory, stream, cascade, and lock.
  • MEMORY_STALL: Time AI Engine was in a memory stall. This could be due to multiple reasons such as multiple memory accesses on the same bank in the same cycle, multiple kernels accessing multiple memories on the same bank, etc.
  • STREAM_STALL: Time AI Engine was in a stream stall. This could be due to multiple reasons such as streams being read faster than they are written to streams from the PL being clocked at a slower frequency, etc.
  • CASCADE_STALL: Time AI Engine was in a cascade stall. This could be due to multiple reasons such as cascade streams being read faster than they are written to streams from the PL being clocked at a slower frequency, etc.
  • LOCK_STALL: Time AI Engine was in a lock stall. This could be due to multiple reasons such as buffers being read faster than they are written to or from streams between PL being clocked at a slower frequency, etc.

Recommendation

See this link in AI Engine Tools and Flows User Guide (UG1076) for all supported stalls.

  • MEMORY_STALL: You can resolve the stall by examining access patterns using trace results and placing the memory on different banks, or using the Aiecompiler "BufferOptLevel" mapper option.
    • Dispatch memories to different banks. (memories include system memory, RTP, window buffers, data memories)
    • If memory banks are exhausted, do profile and trace to find better solution.
    • Specify BufferOptLeve option in aiecompiler to build design.
  • STREAM_STALL: You can resolve the stall by examining stream access patterns using trace results and increasing/balancing the FIFO depth on the stream, or maximizing the PL bandwidth to the AI Engine.
    • Increase FIFO depth.
    • Adjust stream read and write instructions in the loop.
    • Multiple streams: Insert DMA FIFO or set different FIFO depth for different destination nets.
    • PLIO: maximize AIE-PL interface bandwidth. For example: 64bit interface, highest frequency(1/2 AIE frequency) for PL, BLI register (channels with it).
  • CASCADE_STALL: You can resolve the stall by examining stream access patterns using trace results and adjusting the instructions in the loop to match between the input/output streams or maximizing the PL bandwidth to the AI Engine.
    • Adjust instructions in the loop.
  • LOCK_STALL: You can resolve the stall by examining buffer access patterns using trace results and acquiring and releasing buffers on time. Use of local buffers may also resolve the issue. You should also ensure the PL interface throughput matches the AI Engine throughput in the case the PL interface is either the source or destination of the stall.
    • Use PING-PONG buffer (default).
    • Balance throughput between kernels.
    • Acquire and release buffer in-time. Use local buffer as needed.