Stream-Based Access - 2025.2 English - UG1079

AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID
UG1079
Release Date
2025-11-26
Version
2025.2 English

With a stream-based access model, the kernels receive an input stream or an output stream of typed data as an argument. Each access to these streams is synchronized. That is, reads stall if the data is not available in the stream and writes stall if the stream is unable to accept new data.

An AI Engine supports two 32-bit input stream ports with id=0 or 1 and two 32-bit output stream ports with id=0 or 1. This ID is supplied as an argument to the stream object constructors. The AI Engine compiler automatically allocates the input and output stream port IDs from left to right in the argument list of a kernel. Multiple kernels mapped to the same AI Engine are not allowed to share stream ports unless the streams are packet switched (see Explicit Packet Switching).
public:
  input_plio din;
  output_plio dout;
  adf::kernel k0,k1;
...
connect <stream> (din.out[0], k1.in[0]);
connect <stream> (k1.out[0], k2.in[0]);
connect <stream> (k2.out[0], dout.in[0]);
There is also a direct stream communication channel between the accumulator register of one AI Engine and the physically adjacent core, called a cascade. The cascade stream connects within the AI Engine array in a snake-like linear fashion from AI Engine processor to processor.
connect <cascade> (k1.out[1], k2.in[1]);

The AI Engine compiler automatically infers stream data structures from data flow graph connections. The structures are automatically declared in the wrapper code implementing the graph control. Kernel functions operate on pointers to stream data structures. These pointers are passed to the functions as arguments. There is no need to declare these stream data structures in data flow graph or kernel program.

Stream Connection for Multi-Rate Processing

Multi-rate analysis is not an easy task when it comes to streams and packet-streams connections if there is no user specification. Use constraints to specify how many samples the kernel reads from the input stream or pktstream input. Also specify how many samples the kernel writes to the stream or pktstream output, as shown below:
// constraint to specify samples per iteration for stream/pktstream ports to support multirate connections
constraint<uint32_t> samples_per_iteration(adf::port<adf::input>& p);
constraint<uint32_t> samples_per_iteration(adf::port<adf::output>& p);
The constraint keyword needs the sample datatype as a template value. The function samples_per_iteration is applied to the input or the output of the kernel. You can connect the related stream to another stream of a buffer.
Note: The multi-rate analysis pass calculates the rate for only those stream/pktstream ports for which adf::samples_per_iteration (>0) is specified.