AI Engine-ML Architecture Overview - 2024.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-11-28
Version
2024.2 English

The AI Engine-ML array consists of a 2D array of AI Engine-ML tiles, where each AI Engine-ML tile contains an AI Engine-ML, memory module, and tile interconnect module.

The AI Engine-ML is a highly-optimized processor featuring a single-instruction multiple-data (SIMD) and very long instruction word (VLIW) processor containing a scalar unit, a vector unit, two load units, a single store unit, and an instruction fetch and decode unit. One VLIW instruction can support a maximum of two loads, one store, one scalar operation, one fixed-point or bfloat16 vector operation, and one move instruction.

Figure 1. AI Engine-ML Tile Block Diagram

The available memory module is shared between its north, south, and west AI Engine-ML neighbors. An AI Engine-ML can access the memory module to the north, south, and west, in addition to its own memory module (always on east).

Each AI Engine-ML tile has an AXI4-Stream switch that is a fully programmable 32-bit AXI4-Stream crossbar. It supports both circuit-switched and packet-switched streams with back-pressure. Through MM2S DMA and S2MM DMA, the AXI4-Stream switch provides stream access from and to AI Engine-ML data memory. The switch also contains one 16-deep 34-bit (32-bit data + 1-bit parity + 1-bit TLAST) wide FIFOs.

More details on the AI Engine-ML architecture can be found in the Versal Adaptive SoC AIE-ML Architecture Manual (AM020).