AI Engines - 2025.2 English - UG1483

Vitis Model Composer User Guide (UG1483)

Document ID
UG1483
Release Date
2025-11-20
Version
2025.2 English

An AI Engine is an array of very-long instruction word (VLIW) processors with single instruction multiple data (SIMD) vector units that are highly optimized for compute-intensive applications, specifically digital signal processing (DSP), 5G wireless applications, and artificial intelligence (AI) technology such as machine learning (ML). They provide up to five times higher compute density for vector-based algorithms.

AI Engines provide multiple levels of parallelism including instruction-level and data-level parallelism:

  • Instruction-level parallelism includes a scalar operation, up to two moves, two vector reads (loads), one vector write (store), and one vector instruction that can be executed—in total, a 7-way VLIW instruction per clock cycle.
  • Data-level parallelism uses via vector-level operations where multiple sets of data can operate on a per-clock-cycle basis.

Each AI Engine contains both a vector and scalar processor, dedicated program memory, and 32 KB of local data memory. It can access local memory in any of four neighboring directions (north, south, east, or west). It also has access to DMA engines and AXI4 interconnect switches to communicate via streams to other AI Engines or to the programmable logic (PL) or the DMA.

Some Versal adaptive SoCs include the AI Engine-ML (AIE-ML). This consists of an array of AIE-ML tiles, AIE-ML memory tiles, and the AIE-ML array interface consisting of the network on chip (NoC) and PL tiles. Each AIE-ML tile integrates a very-long instruction word (VLIW) processor, integrated memory, and interconnects for streaming, configuration, and debug. The AIE-ML array introduced a separate functional block, the memory tile, that significantly reduces PL resources (LUTs and URAMs) for ML applications.