There are four input and output types (see Fig. 8):
Fig. 8: AIE Kernel Port Types
Stream
Streams use an AXI-4 stream interface. A stream is 32 bits wide. Streams may come from and go to programmable logic (PL) or another AIE tile. Depending on the architecture, an AIE tile may have one or two input streams, and one or two output streams. Streams are useful when data has to be processed sequentially and has the potential to provide the lowest latency at the expense of lower throughput. Using the first generation AI Engine architecture as an example, an AIE tile can receive 64 bits of data through two input streams in one cycle.
Buffer
Buffers use local memory on the AIE tile or adjacent tiles. Buffer data can come from and go to GMIO (global memory I/O, i.e., external DDR), PL, or an adjacent AIE tile. An AIE tile can perform two 256-bit loads from memory and one 256-bit write to memory. Using buffers allows higher throughput at the expense of higher latency since the buffer needs to be filled before it can be accessed.
Accumulator cascade
Several algorithms require a sum-of-products calculation. A long sum may be distributed across multiple AIE tiles, with each tile calculating a partial sum and cascading (or passing) a partial sum to an adjacent tile (see Fig. 9).
Fig. 9: Accumulator Cascade iIntuition
For example, instead of summing 32 products in four cycles (eight sum-of-products calculated in one cycle), splitting the operation into four partial sums of eight products and cascading the partial sums may provide a result in one cycle. This reduces latency at the expense of using more AIE tiles.
Runtime parameter (RTP)
Use runtime parameters to have the processor system (PS) modify the behavior of a kernel program or obtain state and status information.
Runtime parameters are specified as scalar function arguments
Input RTP: pass-by-value
Output RTP: pass-by-reference
In the ADF graph, they may be specified as:
Asynchronous: You must provide the RTP at least once and reuse on every function invocation until updated
Synchronous: You must provide the RTP on every function invocation
Fig. 10 shows a kernel function using input and output RTPs.
Fig. 10: Function with Input and Output RTPs