Vector Registers - 2025.1 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-05-29
Version
2025.1 English

All vector intrinsic functions require the operands to be present in the AI Engine vector registers. The following table shows the set of vector registers available in the AIE-ML and AIE-ML v2 architecture and how smaller registers are combined to form larger registers.

Table 1. Vector Registers
256-bit 512-bit 1024-bit
wl0 x0 y0 (AIE-ML v2 only)
wh0
wl1 x1
wh1
wl2 x2 y1 (AIE-ML v2 only)
wh2
wl3 x3
wh3
wl4 x4 y2
wh4
wl5 x5
wh5
wl6 x6 y3
wh6
wl7 x7
wh7
wl8 x8 y4
wh8
wl9 x9
wh9
wl10 x10 y5
wh10
wl11 x11
wh11

The underlying basic hardware registers are 24x256-bit wide and prefixed with the letter w. Two w registers can be grouped to form a 512-bit register prefixed with x. Two x registers can then be grouped to form a 1024-bit register with the prefix y.

Other registers exist to handle sparsity (Q), MX data type level 1 exponent (E), and level 2 exponent (G). The following table shows the architectural support for each register.

Table 2. Mask/Sparsity and MX Data Type Related Registers
Description Register Name AI Engine-ML AI Engine ML V2
Mask/Sparsity Register Q 4 x 128-bit 8 x 128-bit
MX Exponent Registers (Level 1) E NA 12 x 64-bit
MX sub-tile shift Registers (Level 2) G NA 12 x 64-bit
MX shift msb Registers F NA 12 x 128-bit

Vector registers are a valuable resource. If the compiler runs out of available vector registers during code generation, then it generates code to spill the register contents into local memory and read the contents back when needed. This consumes extra clock cycles.

The name of the vector register used by the kernel during its execution is shown for vector load/store and other vector-based instructions in the kernel microcode. This microcode is available in the disassembly view in the AMD Vitis™ IDE. For additional details on Vitis IDE usage, see Using the Vitis Unified IDE in the Vitis Reference Guide (UG1702).

Operations

The aie::vector has member functions to support multiple operations on vector. Some common operations include:

insert()
Updates the contents of a specific region of the vector using the subvector parameter passed to this function and returns a reference to the updated vector.
grow()
Creates and returns a larger vector where current vector is copied to a larger vector and the other parts are undefined. The function parameter to grow() indicates the location where the current vector should be copied within the output vector.
grow_replicate()
The vector is replicated multiple times and the larger vector is returned.
extract()
Returns a subvector with the contents of a specific region of the vector.
push()
Shifts all elements in the vector up and writes the given value into the first position (at index = 0) of the vector, where the element in the last position (at index = (N-1), where N is the length of the vector) of the vector is lost.
cast_to()
Reinterprets the current vector as a vector of the given type. The number of elements is automatically computed by the function.
set()
Updates the value of the element on the given index.
get()
Returns the value of the element on the given index.
operator[]
Returns a constant or non-constant reference object to the element on the given index.
aie::vector<int16,16> wv;
aie::vector<int16,8> vv0,vv1;

wv.insert(0,vv0); //lower half is vv0
wv.insert(1,vv1); //higher half is vv1

wv.push(10); //shift and set wv[0]=10
int16 i0=wv[0];
wv[1]=i0;
aie::vector<cint16,8> cv=wv.cast_to<cint16>(); //cast wv to complex type
aie::vector<cint16,4> cv0=cv.extract<4>(/*idx=*/1); //extract higher half from cv

/* wv is a vector of size 16,
 *grow function returns a larger vector of size 32 
 *including the content of vector wv in it. 
 *"0" means that wv is in the first 16 elements
*/
aie::vector<int16,32> xv=wv.grow<32>(0);
// wv which is of size 16 is replicated 4 times and returned.
aie::vector<int16,64> xv2=wv.grow_replicate<64>();