AI Engine Data Memory - 2025.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-11-26
Version
2025.2 English

Each AI Engine has 16 KB of program memory, which allows storing 1024 instructions of 128-bit each. The AI Engine instructions are 128 bit (maximum) wide and support multiple instruction formats, as well as variable length instructions to reduce the program memory size. Many instructions outside of an optimized inner loop can use the shorter formats.

Each AI Engine tile has 64 KB of data memory for AIE-ML and AIE-ML v2, divided into eight single-port memory banks.

Each AI Engine can access its own data memory in addition to those in adjacent AI Engine tiles in the north, south, and west neighbors, allowing a single AIE-ML / AIE-ML v2 to access a total of 256 KB of data memory. The stack is placed in data memory. The default sizes for the stack and heap are 1 KB each. Heap size can be automatically computed and adjusted by the compiler when the optimization level is larger than zero (xlopt>=1 for the AI Engine compiler). Stack size and heap size can be changed using compiler options or constraints in the source code. When the tool computed heap size (with xlopt >= 1) is greater than the explicitly specified value, the compiler fails. Refer to the AI Engine Tools and Flows User Guide (UG1076) for more information about stack and heap size usage.

In a logical representation, the 256 KB memory can be viewed as one contiguous 256 KB block or four 64 KB blocks, and each block can be divided into four odd and four even banks. One even bank and one odd bank are interleaved to comprise a double bank. AI Engines on the edges of the AI Engine array have fewer neighbors and correspondingly less memory available.

Each AI Engine has three address generation units (AGUs) or ports. The AGUs in an AI Engine can be used for address generation for vector load/store operations.

Each memory port operates in vector register mode or scalar register mode for devices with AIE-ML. The ports are created by an even and odd pairing of the memory banks. The 8-bit and 16-bit stores are implemented as read-modify-write instructions (minimum memory access granularity is 32 bits). Concurrent operation of all three ports (Address Generation Units) is supported if each port is accessing a different bank.

Data stored in memory is in little endian format.

Each AI Engine has a DMA controller that is divided into two separate modules namely S2MM and MM2S:

  • S2MM to store stream data to memory
  • MM2S to write the contents of the memory to a stream.

Both S2MM and MM2S have two independent data channels.

Table 1. AI Engine Data Memory
Component AIE-ML AIE-ML v2
Memory Bank 512 word x 128-bit single-port 256 word x 256-bit single-port
AGUs two 256-bit load & one 256-bit store units two 512-bit load & one 512-bit store units
Vector Register Mode 256-bit 512-bit
Scalar Register Mode 32-bit/16-bit/8-bit 32-bit/16-bit/8-bit
S2MM DMA 32-bit data 64-bit data
MM2S DMA 32-bit data 64-bit data