Stream and Cascade Data Types - 2024.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-11-28
Version
2024.2 English

Each of the data types in the table can be read or written from the AI Engine-ML as either scalars or in vector groups. However, there are certain restrictions on valid groupings based on the bus data width supported on the AI Engine-ML to programmable logic interface ports or through the stream-switch network. The valid combinations for AI Engine kernels are vector bundles totaling up to 32 bits. The accumulator data types are only used to specify cascade-stream connections between adjacent AI Engines. Its valid groupings are based on the 512-bit wide cascade channel between two processors.

Input and output Cascade data types can be used in the context of AI Engine APIs or ADF APIs.

To use AI Engine APIs, include #include <aie_adf.hpp> in the kernel source code.

ADF APIs support limited number of lanes. To use ADF APIs, include #include <adf.h> in the kernel source code. ADF APIs are used for advanced kernel programming using intrinsic calls.

Table 1. Stream Data Types
Input Stream Types Output Stream Types
input_stream<int8> output_stream<int8>
input_stream<int16> output_stream<int16>
input_stream<int32> output_stream<int32>
input_stream<int64> output_stream<int64>
input_stream<uint8> output_stream<uint8>
input_stream<uint16> output_stream<uint16>
input_stream<uint32> output_stream<uint32>
input_stream<uint64> output_stream<uint64>
input_stream<cint16> output_stream<cint16>
input_stream<cint32> output_stream<cint32>
input_stream<float> output_stream<float>
input_stream<cfloat> output_stream<cfloat>
input_stream<bfloat16> output_stream<bfloat16>
Table 2. Cascade Accumulator Data Types
Input Cascade Types Output Cascade Types Lanes in ADF API (adf.h) Lanes in AIE API (aie_adf.hpp)
input_cascade<acc32> output_cascade<acc32> 16/32 8/16/32/64/128
input_cascade<acc64> output_cascade<acc64> 8/16 4/8/16/32/64
input_cascade<cacc64> output_cascade<cacc64> 4/8 2/4/8/16/32
input_cascade<accfloat> output_cascade<accfloat> 16 4/8/16/32/64/128
input_cascade<caccfloat> output_cascade<caccfloat> 8 2/4/8/16/32/64
input_cascade<int8> output_cascade<int8> 64/128 16/32/64/128
input_cascade<uint8> output_cascade<uint8> 64/128 16/32/64/128
input_cascade<int16> output_cascade<int16> 32/64 8/16/32/64
input_cascade<uint16> output_cascade<uint16> 32/64 8/16/32/64
input_cascade<int32> output_cascade<int32> 16/32 4/8/16/32
input_cascade<uint32> output_cascade<uint32> 16/32 4/8/16/32
input_cascade<cint16> output_cascade<cint16> 16/32 4/8/16/32
input_cascade<cint32> output_cascade<cint32> 8/16 2/4/8/16
input_cascade<bfloat16> output_cascade<bfloat16> 32 8/16/32/64