Key Differences between AI Engine and AIE-ML

Versal Adaptive SoC AIE-ML Architecture Manual (AM020)

Document ID
AM020
Release Date
2024-05-15
Revision
1.3 English

AI Engines offered in some AMD Versalâ„¢ adaptive SoCs are present in different versions optimized for different markets. The initial version, AI Engine (AIE), is optimized for DSP and communication applications, while the AI Engine-Machine Learning ( AIE-ML) introduces a version optimized for machine learning. In this section, the main differences between AIE and AIE-ML are described, including:

  • Increased throughput for ML/AI inference workloads.
  • Optimized ML/AI application precision. For example, added bfloat16 and INT4.
  • Increased on-chip memory capacity and bandwidth (two times the data memory in each AIE-ML tile and the addition of AIE-ML memory tiles per column in the AIE-ML array).
  • Increased multiplier performance.
  • Focus on power efficiency (increase TOPs/W).
  • Improved hardware for synchronization and reconfiguration.

The differences between the AIE and AIE-ML blocks include the following:

  • Removed:
    • Native support for INT32. Multiplication of 32-bit numbers are not directly supported but are emulated via decomposition into multiple multiplications of 16 x 16 bit. Also supports cint32 x cint16 multiplication to optimize FFT performance.
    • Native FP32 (supported through emulation using bfloat16).
  • Added:
    • Double INT8/16 compute per tile vs AIE
    • Bfloat16 and INT4
    • Local memory tiles
Tip: To understand the features in the first version of AI Engine, refer to the Versal Adaptive SoC AI Engine Architecture Manual (AM009).