Versal Adaptive SoC AIE-ML Architecture Manual (AM020)

Document ID
Release Date
1.3 English
Table 1. Summary of the Key Differences between AIE and AIE-ML
Array structure Checkerboard All lines identical
Cascade interface 384-bits wide

Horizontal direction

512-bits wide

Horizontal and vertical directions

Tile stream interface 2 × 32-bit in and 2 × out 32-bit out 1 × 32-bit in and 1 × out 32-bit out
Memory load/store per cycle 512/256 bits 512/256 bits
Advanced DSP functionality Yes No
INT4 operations/tile 256 1024 2
INT8 operations/tile 256 512
INT16 operations/tile 64 128
INT32 operations/tile 16 32 4
Bfloat16 float operations/tile 256
FP32 float operations/tile 16 42 3
Data memory/tile 32 KB 64 KB
Program memory/tile 16 KB 16 KB
Memory tiles 512 KB
Programmable logic (PL) to AIE array bandwidth 1X 1X
Tile local memory DMA Support for
  • 3D addressing modes​
  • S2MM finish on TLAST and out-of-order packets
  • Compression/decompression​
Local memory locks​ Boolean Semaphore​
  1. In cases without sparsity and for ML applications. See Sparsity for more information.
  2. Actually INT8 x INT4.
  3. Emulation mode: The specified value gives 16b accuracy for the mantissa. The number is reduced for higher accuracy on the mantissa.
  4. int32 × int32 can be emulated.