Writing and Advancing an Output Stream - 2025.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-11-26
Version
2025.2 English

AI Engine Operations

The following operations write data to the given output stream and advance the stream on the AI Engine. Data values can be written to the output stream one at a time or as a vector. Until all values are written, the stream operation stalls. The data groupings are based on the underlying single cycle, 32-bit stream operation. Cascade connection writes all values in parallel.

//Scalar operations
//#include<aie_adf.hpp> or #include<adf.h> 
void writeincr(output_stream<int32> *w, int32 v);
void writeincr(output_stream<int64> *w, int64 v);
void writeincr(output_stream<uint32> *w, uint32 v);
void writeincr(output_stream<cint16> *w, cint16 v);
void writeincr(output_stream<cint32> *w, cint32 v);
void writeincr(output_stream<float> *w, float v);

//AIE API Operations to read vector data which supports more vector lanes
//#include<aie_adf.hpp>
void writeincr(output_stream<int8> *w, aie::vector<int8,16> &v);
void writeincr(output_stream<uint8> *w, aie::vector<uint8,16> &v);
void writeincr(output_stream<int16> *w, aie::vector<int16,8> &v);
void writeincr(output_stream<cint16> *w, aie::vector<cint16,4> &v);
void writeincr(output_stream<int32> *w, aie::vector<int32,4> &v);
void writeincr(output_stream<cint32> *w, aie::vector<cint32,2> &v);
void writeincr(output_stream<float> *w, aie::vector<float,4> &v);
void writeincr(output_stream<bfloat16> *w, aie::vector<bfloat,8> &v);

template<typename T,int N>
void writeincr(output_cascade<T> *w, aie::accum<T,N> &v);
template<typename T,int N>
void writeincr(output_cascade<T> *w, aie::vector<T,N> &v);

//ADF API Operations to read vector data which supports limited vector lanes
//#include<adf.h>
void writeincr(output_cascade<acc32>* str,v16acc32 value);
void writeincr(output_cascade<acc32>* str,v32acc32 value);
void writeincr(output_cascade<acc64>* str,v8acc64 value);
void writeincr(output_cascade<acc64>* str,v16acc64 value);
void writeincr(output_cascade<cacc64>* str,v4acc64 value);
void writeincr(output_cascade<cacc64>* str,v8acc64 value);
void writeincr(output_cascade<accfloat>* str,v16accfloat value);
void writeincr(output_cascade<caccfloat>* str,v8caccfloat value);
void writeincr(output_cascade<int8>* str,v64int8 value);
void writeincr(output_cascade<int8>* str,v128int8 value);
void writeincr(output_cascade<uint8>* str,v64uint8 value);
void writeincr(output_cascade<uint8>* str,v128uint8 value);
void writeincr(output_cascade<int16>* str,v32int16 value);
void writeincr(output_cascade<int16>* str,v64int16 value);
void writeincr(output_cascade<uint16>* str,v32uint16 value);
void writeincr(output_cascade<uint16>* str,v64uint16 value);
void writeincr(output_cascade<cint16>* str,v16cint16 value);
void writeincr(output_cascade<cint16>* str,v32cint16 value);
void writeincr(output_cascade<int32>* str,v16int32 value);
void writeincr(output_cascade<int32>* str,v32int32 value);
void writeincr(output_cascade<uint32>* str,v16uint32 value);
void writeincr(output_cascade<uint32>* str,v32uint32 value);
void writeincr(output_cascade<cint32>* str,v8cint32 value);
void writeincr(output_cascade<cint32>* str,v16cint32 value);
void writeincr(output_cascade<bfloat16>* str,v32bfloat16 value);

For the supported data types and lanes in AIE API writeincr, see Stream and Cascade Data Types.

Note: To indicate the end of stream, the writeincr API can be used with TLAST argument as shown below.
void writeincr(output_stream<int32> *w, int32 value, bool tlast);
Note: The AIE-ML v2 stream interconnect is 64 bits wide. When a kernel writes a 32-bit word by writeincr API, it concatenates two words together before sending them out. If only one leftover 32-bit word remains, it is held inside the kernel, waiting for the next word. To push out the final odd word, you can assert TLAST to true in the writeincr API.