Similar to AIE, the AI Engine processor in AIE-ML consists of a scalar 32-bit data path, a SIMD vector data path, two load units, and a store unit, and is optimized for ML applications.
The following provides a list of AIE-ML processor features:
- Instruction-based VLIW SIMD processor with new instructions
- Same 16 KB program memory as in AIE
- Vector unit supports 256 (8b x 8b) and 512 (4b x 8b) MAC operations
- Vector unit supports 128 bloat16 MAC operations with FP32 accumulation
- Vector unit supports structure sparsity and FFT processing for ML inference applications, including cint32 x cint16 multiplication (data in cint32 and twiddle factor is cint16), control support for complex and conjugation, new permute mode, and shuffle mode. See Sparsity for more information.
- A new processor bus that allows the processor to access memory mapped registers in the local AIE-ML tile
- The complex circular addressing modes are dropped and replaced by a 3D addressing mode
- On-the-fly decompression during loading of sparse weights. See Sparsity for more information.
The AIE-ML processor removes some advanced DSP functionality used in the AIE processor including:
- 32-bit floating-point vector data path is not directly supported but can be emulated via decomposition into multiple multiplications of 16 x 16-bit
- Scalar non-linear functions, including sin/cos, sqrt, inverse sqrt and inverse
- Scalar floating point/integer conversions
- Complex circular addressing and FFT addressing modes. However, some level of FFT and complex support is provided; see the AIE-ML processor features.
- Limited support 128-bit load/store
- Non-aligned memory access
- Support for some complex data-types; some level of complex support is provided, see the AIE-ML processor features
- Native support for 32 × 32 multiplication but can be emulated using 16-bit integer operands
- Removal of non-blocking 128-bit stream interfaces and stream FIFOs
- Control streams and packet header generations