Architecture Overview - 2025.1 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-05-29
Version
2025.1 English

The AI Engine-ML / AI Engine-ML v2 array consists of a 2D array of AIE-ML / AIE-ML v2 tiles, where each tile contains an AI Engine, memory module, and tile interconnect module.

The AI Engine is a highly-optimized processor featuring a single-instruction multiple-data (SIMD) and very long instruction word (VLIW) processor containing a scalar unit, a vector unit, two load units, a single store unit, and an instruction fetch and decode unit. One VLIW instruction can support a maximum of two loads, one store, one scalar operation, one fixed-point or bfloat16 vector operation, and one move instruction. Devices with AIE-ML v2 have wider SIMDs than devices with AIE-ML.

Figure 1. AI Engine-ML Tile Block Diagram

The tile memory module is shared between its north, south, and west AI Engine neighbors. The AI Engine can access the memory module to the north, south, and west, in addition to its own memory module (always on east).

Each AI Engine has an AXI4-Stream switch that is fully programmable. The AXI4-Stream crossbar is 32-bit for AIE-ML and 64-bit for AIE-ML v2. The AXI4-Stream switch supports both circuit-switched and packet-switched streams with back-pressure. Through MM2S and S2MM DMAs, the AXI4-Stream switch provides stream access to and from the AI Engine data memory. The AXI4-Stream switch also contains FIFOs that are 16-deep 34-bit wide (32-bit data + 1-bit parity + 1-bit TLAST) in AIE-ML and 16-deep 68-bit (64-bit data + 2-bit parity + 1-bit TLAST + 1-bit TKEEP) wide in AIE-ML v2.

More details on the AI Engine-ML architecture can be found in the Versal Adaptive SoC AIE-ML Architecture Manual (AM020). More details on the AI Engine-ML v2 can be found in the Versal Adaptive SoC AIE-ML v2 Architecture Manual (AM027)