The scalar units of the AI Engine support signed and unsigned integers in 8, 16, and 32-bit widths, along with single-precision floating-point data types for specific operations.
The two main vector data types offered by the AI Engine API are vectors (aie::vector) and accumulators (aie::accum).
aie::broadcast((bfloat16){1}) is equivalent to aie::broadcast<bfloat16,32>((bfloat16){1}) where
specifying the size as <bfloat16,32> is
optional because it is the default vector size for the bfloat16 data type.| Vector Data Type | Supported Sizes (Default Highlighted ) | Supported in AI Engine-ML | Supported in AI Engine-ML v2 |
|---|---|---|---|
| int4 | 32/64/128/256 | Yes | Yes |
| uint4 | 32/64/128/256 | Yes | Yes |
| int8 | 16/32/64/128 | Yes | Yes |
| uint8 | 16/32/64/128 | Yes | Yes |
| int16 | 8/16/32/64 | Yes | Yes |
| uint16 | 8/16/32/64 | Yes | Yes |
| int32 | 4/8/16/32 | Yes | Yes |
| uint32 | 4/8/16/32 | Yes | Yes |
| cint16 | 4/8/16/32 | Yes | Yes |
| cint32 | 2/4/8/16 | Yes | Yes |
| float | 4/8/16/32 | Yes | Yes |
| float161 | 8/16/32/64 | No | Yes |
| bfloat16 | 8/16/32/64 | Yes | Yes |
| cbfloat16 | 4/8/16/32 | Yes | No |
| float81 | 16/32/64/128 | No | Yes |
| bfloat81 | 16/32/64/128 | No | Yes |
| mx92 | 64/128/256 | No | Yes |
For an efficient use of the vector processor SIMD capabilities there is
the vector API. For example, aie::vector<int32,16> is a 16-element vector of integers with 32
bits. Each element of the vector is referred to as a lane. Using the smallest bit width necessary can improve performance by
making good use of registers.
Vector Based Floating-point Data Types
| Precision | Sign | Mantissa | Exponent |
|---|---|---|---|
| Single: float (FP32) | 1 | 23 | 8 |
| Half float16 (FP16) | 1 | 10 | 5 |
| bfloat16 (BF16) | 1 | 7 | 8 |
| Quarter: float8 (FP8 / E4M3) | 1 | 3 | 4 |
| bfloat8 (BF8) | 1 | 2 | 5 |
As shown in the table, the FP16 and BF16 use 16 bits, but their allocation differs. BF16 prioritizes range, assigning 8 bits to the exponent, while FP16 prioritizes precision by allocating 10 bits to the mantissa.
Block Floating-point Data Types
Block-floating-point data types, also known as MX data types, represent a set of 16 floating-point numbers. These 16 numbers share a common 8-bit level 1 exponent, and eight 1-bit level-2 exponents, shared between a pair of mantissas.
Depending on the type of the block floating-point, the mantissa size is different:
| Precision | Sign | Mantissa | Level-1 Exponent | Level-2 Exponent |
|---|---|---|---|---|
| mx9 | 1 | 7 | 8 | 1 |
In a block floating-point number, 16 values are represented. The ith number value is computed as: x(i) = (-1)Sign(i)*Mantissa(i)*2(E1 - 128 - E2([i/2]) where:
- Sign(i)
- Sign bit of the ith number
- Mantissa(i)
- Mantissa of the ith number
- E1
- Level 1 exponent
- E2([i/2])
- Level 2 exponent of the ith number where [i/2] indicates the index of the corresponding pair to which the number is associated
As shown in the preceding figures, 16 MX9 data are packed into 18 bytes. On average, an MX9 block-floating-point number occupies 9 bits.
block_vector class. For
example:aie::block_vector<mx9,VectorSize> v;where
VectorSize is either 64, 128, or 256.There is no padding in memory, that is why these data cannot be
accessed using simple vector iterators as for standard data type, but specific
block_vector_buffer_stream. See Block Floating-Point Buffer Stream.
Accumulator Data Types
An accumulator represents a collection of elements of the same class, typically resulting from multiplication operations. They are mapped to the corresponding accumulator registers supported on each architecture. Compared to regular vector types, accumulators often possess a larger number of bits, enabling the efficient execution of long chains of operations where intermediate results could potentially exceed the range of standard vectors. Accumulators are parameterized by their element type, and the number of elements they contain.
| acc32 | accfloat | caccfloat | acc40 | acc48 | acc56 | acc64 | cacc32 | cacc40 | cacc48 | cacc56 | cacc64 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Native accumulation bits | 32 | 64 | ||||||||||
Complex accumulators (prefixed by "c"), require twice the number of native accumulation bits: one for the real part and another one for the imaginary part.
The AI Engine API automatically maps accumulator types to the
closest native accumulator type supported by the target architecture. For example,
acc48 maps to acc64 on the AI Engine-ML
architecture.
While aie::vector and aie::accum objects offer member functions for type
casting, data extraction, insertion, and indexing, only a limited subset of
accumulator data types is supported for interfaces within the graph code at this
time. For details, see Stream and Cascade Data Types.
For more information on streams, see Streaming Data API.