Key Differences between AI Engine and AIE-ML

Versal Adaptive SoC AIE-ML Architecture Manual (AM020)

Document ID

AM020

Release Date

2023-11-10

Revision

1.2 English

AI Engines offered in some AMD Versal™ adaptive SoCs are present in different versions optimized for different markets. The initial version, AI Engine (AIE), is optimized for DSP and communication applications, while the AI Engine-Machine Learning ( AIE-ML) introduces a version optimized for machine learning. In this section, the main differences between AIE and AIE-ML are described, including:

Increased throughput for ML/AI inference workloads.
Optimized ML/AI application precision. For example, added bfloat16 and INT4.
Increased on-chip memory capacity and bandwidth (two times the data memory in each AIE-ML tile and the addition of AIE-ML memory tiles per column in the AIE-ML array).
Increased multiplier performance.
Focus on power efficiency (increase TOPs/W).
Improved hardware for synchronization and reconfiguration.

The differences between the AIE and AIE-ML blocks include the following:

Removed:
- Native support for INT32. Multiplication of 32-bit numbers are not directly supported but are emulated via decomposition into multiple multiplications of 16 x 16 bit. Also supports cint32 x cint16 multiplication to optimize FFT performance.
- Native FP32 (supported through emulation using bfloat16).
Added:
- Double INT8/16 compute per tile vs AIE
- Bfloat16 and INT4
- Local memory tiles

Tip: To understand the features in the first version of AI Engine, refer to the Versal Adaptive SoC AI Engine Architecture Manual (AM009).