Stream Switch Buffering and Latency - Stream Switch Buffering and Latency - AM009

Versal Adaptive SoC AI Engine Architecture Manual (AM009)

Document ID
AM009
Release Date
2026-02-18
Revision
1.4 English

The AXI4-Stream data path between the programmable logic and the AI Engine array includes multiple buffering points. Each buffering stage contributes to the overall data path latency and backpressure behavior. The the typical buffering stages are as follows:

  • PL to AI Engine asynchronous FIFO: 12-deep, used for clock domain crossing
  • Stream switch for each traversed tile or interface tile:
    • 4-deep FIFO, 2-cycle latency per port (master and slave)
    • Combined total of 8-deep buffering and 4-cycle latency per stream switch
  • Optional chainable FIFO in stream switches: 16-deep, can be inserted along the path to increase buffering

PL to AI Engine interfaces can be enabled or disabled through a configuration register. All interfaces are disabled at reset. When an interface is disabled, no data flows from the PL into the AI Engine array. When an interface is enabled, data can flow into the 12-deep clock domain crossing FIFO even if stream routing is not yet configured. No data is lost if the PL master is AXI4-Stream compliant and stops sending data when TREADY is deasserted.

Figure 1. Typical Buffering Stages