AI Engines offered in some AMD Versalâ„¢ adaptive SoCs are present in different versions optimized for different markets. The initial version, AI Engine (AIE), is optimized for DSP and communication applications, while the AI Engine-Machine Learning ( AIE-ML) introduces a version optimized for machine learning. In this section, the main differences between AIE and AIE-ML are described, including:
- Increased throughput for ML/AI inference workloads.
- Optimized ML/AI application precision. For example, added bfloat16 and INT4.
- Increased on-chip memory capacity and bandwidth (two times the data memory in each AIE-ML tile and the addition of AIE-ML memory tiles per column in the AIE-ML array).
- Increased multiplier performance.
- Focus on power efficiency (increase TOPs/W).
- Improved hardware for synchronization and reconfiguration.
The differences between the AIE and AIE-ML blocks include the following:
- Removed:
- Native support for INT32. Multiplication of 32-bit numbers are not directly supported but are emulated via decomposition into multiple multiplications of 16 x 16 bit. Also supports cint32 x cint16 multiplication to optimize FFT performance.
- Native FP32 (supported through emulation using bfloat16).
- Added:
- Double INT8/16 compute per tile vs AIE
- Bfloat16 and INT4
- Local memory tiles
Tip: To understand the features in the
first version of AI Engine, refer to the
Versal
Adaptive SoC AI Engine Architecture Manual (AM009).