Kernel Stall - 2021.2 English

Vitis Guidance Messaging (UG1315)

Document ID
UG1315
Release Date
2021-10-27
Version
2021.2 English

Description

The application reported a high level of kernel stall during the runtime. This can affect the overall performance of the application.

Explanation

There are three types of stalls reported by the runtime:

Intra-Kernel Dataflow Stalls (%)
Reports the percentage of running time consumed in stalls when streaming data within kernels. These stalls are the cycles when any of the internal dataflow block "stall" signals are high. For example: internal blocks, a producer and a consumer are pipelined with II=1 and II=2 respectively. This will be reported as 50% stall as the producer can only produce data every other clock cycle whereas the consumer is capable of receiving data every cycle.
External Memory Stalls (%)
Reports the percentage of running time consumed by stalls for memory transfers outside the CU. These can be caused by multiple ports accessing the memory at the same time causing contention and resulting in stalling the transfers.
Inter-Kernel Pipe Stalls (%)
Reports the percentage of running time consumed in stalls when streaming data to or from outside the CU. These originate from writing to at full FIFO between kernels or reading from empty FIFO between kernels

Resolution

The following are few actions to be taken for possible improvement in Stalls.

Intra-kernel dataflow stalls
  • HW emulation can provide more detailed information as it taps into hierarchical stall signals.
  • Increase the FIFO sizing within kernel internal blocks/loops.
External memory stalls
Increase the number of memory ports for access if the application allows it.
Inter-kernel pipe stalls
HW emulation can provide more detailed information as it taps into hierarchical stall signals.
  • Increase the FIFO depths between kernels for any performance improvements.
  • If the consumer is reading faster than producer writing into FIFO, improve the throughput of a producer to fill FIFO so that consumer is not starved.
  • If the consumer is reading slower than producer writing into FIFO, improve the throughput of the consumer so that producer doesn't wait on FIFO full.