AIE-ML Array Architecture

Versal Adaptive SoC AIE-ML Architecture Manual (AM020)

Document ID
AM020
Release Date
2024-05-15
Revision
1.3 English

This section compares differences in the arrays. For more information, see AIE-ML Array Interface Architecture and AI Engine Array Interface Architecture in the Versal Adaptive SoC AI Engine Architecture Manual (AM009). The following provides a summary of the key features of AIE-ML that are similar to AIE:

  • Same process, voltage, frequency, clock, and power distribution
  • Same array topology (one VLIW SIMD processor per AIE-ML tile)
  • Each AIE-ML tile has eight integrated banks of data memory shared with three neighboring tiles.
  • Each AIE-ML tile has two DMA channels in each direction
  • AIE-ML tile to tile stream interconnect has same bandwidth as AIE
  • Same PL and NoC interface
  • Same debug/trace functionality

The following provides a summary of the key features of AIE-ML that are different or enhanced from AIE:

  • At the tile level, the compute/memory is doubled. A processor bus is added to allow the AIE-ML perform direct read/write accesses to local tile memory mapped registers.
  • Enhanced DMA are added to the AIE-ML tiles, AIE-ML memory tiles, and AIE-ML array interface tiles that include 3D address generation for tiles/array interface tiles and 4D address generation for memory tiles, out-of-order packets, and Finish-on-TLast in S2MM. Supports Compression and decompression (tiles and memory tiles) are supported to better handle sparse weights and activations in CNN and RNN application. See Sparsity for more information.
  • Addition of AIE-ML memory tiles (maximum of two rows) to significantly reduce programmable logic (PL) resources (LUTs and URAMs) utilization. There is 512 KB of memory per memory tile with ECC and 12 DMA channels (6 MM2S and 6 S2MM).
  • Increased memory capacity due to the doubling of the data memory in AIE-ML tiles and the addition of AIE-ML memory tiles.
  • Increase in power efficiency (TOPs/W).
  • Improved stream switch functionality including source to destination parity check and deterministic merge.
  • Improved reconfiguration and synchronization support.
  • Grid array architecture to support vertical (from top to bottom) and horizontal (from left to right) 512-bit cascade, versus 384-bit horizontal cascade only.

The following figures show the change from checkerboard architecture in AIE to grid architecture in AIE-ML. Of note, in AIE-ML the tile rows are all in the same direction. The cascade connections are only from north to south and from west to east.

Figure 1. AIE to AIE-ML Array Configuration