AI Engine Kernel Input and Output Types - 2025.1 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2025-08-25
Version
2025.1 English

There are four input and output types (see Fig. 8):

Fig. 8: AIE kernel port types Fig. 8: AIE Kernel Port Types

  • Stream

    Streams use an AXI-4 stream interface. A stream is 32 bits wide. Streams may come from and go to programmable logic (PL) or another AIE tile. Depending on the architecture, an AIE tile may have one or two input streams, and one or two output streams. Streams are useful when data has to be processed sequentially and has the potential to provide the lowest latency at the expense of lower throughput. Using the first generation AI Engine architecture as an example, an AIE tile can receive 64 bits of data through two input streams in one cycle.

  • Buffer

    Buffers use local memory on the AIE tile or adjacent tiles. Buffer data can come from and go to GMIO (global memory I/O, i.e., external DDR), PL, or an adjacent AIE tile. An AIE tile can perform two 256-bit loads from memory and one 256-bit write to memory. Using buffers allows higher throughput at the expense of higher latency since the buffer needs to be filled before it can be accessed.

  • Accumulator cascade

    Several algorithms require a sum-of-products calculation. A long sum may be distributed across multiple AIE tiles, with each tile calculating a partial sum and cascading (or passing) a partial sum to an adjacent tile (see Fig. 9).

    Fig. 9: Accumulator cascade intuition Fig. 9: Accumulator Cascade iIntuition

    For example, instead of summing 32 products in four cycles (eight sum-of-products calculated in one cycle), splitting the operation into four partial sums of eight products and cascading the partial sums may provide a result in one cycle. This reduces latency at the expense of using more AIE tiles.

  • Runtime parameter (RTP)

    Use runtime parameters to have the processor system (PS) modify the behavior of a kernel program or obtain state and status information.

    Runtime parameters are specified as scalar function arguments

    • Input RTP: pass-by-value

    • Output RTP: pass-by-reference

    In the ADF graph, they may be specified as:

    • Asynchronous: You must provide the RTP at least once and reuse on every function invocation until updated

    • Synchronous: You must provide the RTP on every function invocation

    Fig. 10 shows a kernel function using input and output RTPs.

    Fig. 10: Function with input and output RTPs Fig. 10: Function with Input and Output RTPs