Data Types - 2025.1 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-05-29
Version
2025.1 English

The scalar units of the AI Engine support signed and unsigned integers in 8, 16, and 32-bit widths, along with single-precision floating-point data types for specific operations.

The two main vector data types offered by the AI Engine API are vectors (aie::vector) and accumulators (aie::accum).

A vector represents a collection of elements of the same type which is transparently mapped to the corresponding vector registers supported on AIE-ML / AIE-ML v2 architectures. Vectors are parameterized by the element type and the number of elements, and any combination that defines a 128b/256b/512b/1024b vector is supported.
Note: The integer in bold is the native vector size corresponding to a 512-bit register for the data type supported in the AI Engine API. For example, aie::broadcast((bfloat16){1}) is equivalent to aie::broadcast<bfloat16,32>((bfloat16){1}) where specifying the size as <bfloat16,32> is optional because it is the default vector size for the bfloat16 data type.
Table 1. Supported Vector Data Types and Sizes
Vector Data Type Supported Sizes (Default Highlighted ) Supported in AI Engine-ML Supported in AI Engine-ML v2
int4 32/64/128/256 Yes Yes
uint4 32/64/128/256 Yes Yes
int8 16/32/64/128 Yes Yes
uint8 16/32/64/128 Yes Yes
int16 8/16/32/64 Yes Yes
uint16 8/16/32/64 Yes Yes
int32 4/8/16/32 Yes Yes
uint32 4/8/16/32 Yes Yes
cint16 4/8/16/32 Yes Yes
cint32 2/4/8/16 Yes Yes
float 4/8/16/32 Yes Yes
float161 8/16/32/64 No Yes
bfloat16 8/16/32/64 Yes Yes
cbfloat16 4/8/16/32 Yes No
float81 16/32/64/128 No Yes
bfloat81 16/32/64/128 No Yes
mx92 64/128/256 No Yes

For an efficient use of the vector processor SIMD capabilities there is the vector API. For example, aie::vector<int32,16> is a 16-element vector of integers with 32 bits. Each element of the vector is referred to as a lane. Using the smallest bit width necessary can improve performance by making good use of registers.

Figure 1. aie::vector<int32,16>

Vector Based Floating-point Data Types

Floating-point 16 (FP16) is a data type that uses 16 bits to represent a number. It is composed of a sign bit, a 10-bit mantissa, and a 5-bit exponent, resulting in a smaller representation compared to the standard 32-bit single-precision floating-point format (FP32).
Table 2. Number of Bits Comparison of Single, Half Precision Floating-Point and bfloat16
Precision Sign Mantissa Exponent
Single: float (FP32) 1 23 8
Half float16 (FP16) 1 10 5
bfloat16 (BF16) 1 7 8
Quarter: float8 (FP8 / E4M3) 1 3 4
bfloat8 (BF8) 1 2 5

As shown in the table, the FP16 and BF16 use 16 bits, but their allocation differs. BF16 prioritizes range, assigning 8 bits to the exponent, while FP16 prioritizes precision by allocating 10 bits to the mantissa.

Block Floating-point Data Types

Block-floating-point data types, also known as MX data types, represent a set of 16 floating-point numbers. These 16 numbers share a common 8-bit level 1 exponent, and eight 1-bit level-2 exponents, shared between a pair of mantissas.

Figure 2. MX9 Datatype Elements

Depending on the type of the block floating-point, the mantissa size is different:

Table 3. Number of Bits of Block Floating-Point Data Type
Precision Sign Mantissa Level-1 Exponent Level-2 Exponent
mx9 1 7 8 1

In a block floating-point number, 16 values are represented. The ith number value is computed as: x(i) = (-1)Sign(i)*Mantissa(i)*2(E1 - 128 - E2([i/2]) where:

Sign(i)
Sign bit of the ith number
Mantissa(i)
Mantissa of the ith number
E1
Level 1 exponent
E2([i/2])
Level 2 exponent of the ith number where [i/2] indicates the index of the corresponding pair to which the number is associated

As shown in the preceding figures, 16 MX9 data are packed into 18 bytes. On average, an MX9 block-floating-point number occupies 9 bits.

MX9 Block-floating-point data are stored in a block_vector class. For example:
aie::block_vector<mx9,VectorSize> v;
where VectorSize is either 64, 128, or 256.

There is no padding in memory, that is why these data cannot be accessed using simple vector iterators as for standard data type, but specific block_vector_buffer_stream. See Block Floating-Point Buffer Stream.

Accumulator Data Types

An accumulator represents a collection of elements of the same class, typically resulting from multiplication operations. They are mapped to the corresponding accumulator registers supported on each architecture. Compared to regular vector types, accumulators often possess a larger number of bits, enabling the efficient execution of long chains of operations where intermediate results could potentially exceed the range of standard vectors. Accumulators are parameterized by their element type, and the number of elements they contain.

Table 4. Supported Accumulator Types and Sizes
  acc32 accfloat caccfloat acc40 acc48 acc56 acc64 cacc32 cacc40 cacc48 cacc56 cacc64
Native accumulation bits 32 64

Complex accumulators (prefixed by "c"), require twice the number of native accumulation bits: one for the real part and another one for the imaginary part.

The AI Engine API automatically maps accumulator types to the closest native accumulator type supported by the target architecture. For example, acc48 maps to acc64 on the AI Engine-ML architecture.

While aie::vector and aie::accum objects offer member functions for type casting, data extraction, insertion, and indexing, only a limited subset of accumulator data types is supported for interfaces within the graph code at this time. For details, see Stream and Cascade Data Types.

For more information on streams, see Streaming Data API.