Vector Registers - 2024.1 English

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
Release Date
2024.1 English

All vector intrinsic functions require the operands to be present in the AI Engine vector registers. The following table shows the set of vector registers and how smaller registers are combined to form larger registers.

Table 1. Vector Registers
256-bit 512-bit 1024-bit
wl0 x0  
wl1 x1
wl2 x2  
wl3 x3  
wl4 x4 y2
wl5 x5
wl6 x6 y3
wl7 x7
wl8 x8 y4
wl9 x9
wl10 x10 y5
wl11 x11

The underlying basic hardware registers are 256-bit wide and prefixed with the letter w. Two w registers can be grouped to form a 512-bit register prefixed with x. x4 … x11 registers are grouped in pairs to form 1024-bit registers (y2 … y5).

Vector registers are a valuable resource. If the compiler runs out of available vector registers during code generation, then it generates code to spill the register contents into local memory and read the contents back when needed. This consumes extra clock cycles.

The name of the vector register used by the kernel during its execution is shown for vector load/store and other vector-based instructions in the kernel microcode. This microcode is available in the disassembly view in the AMD Vitis™ IDE. For additional details on Vitis IDE usage, see Using the Vitis IDE in the AI Engine Tools and Flows User Guide (UG1076).

The aie::vector has member functions to support multiple operations on vector. Some common operations include:

Updates the contents of a specific region of the vector using the subvector parameter passed to this function and returns a reference to the updated vector.
Creates and returns a larger vector where current vector is copied to a larger vector and the other parts are undefined. The function parameter to grow() indicates the location where the current vector should be copied within the output vector.
The vector is replicated multiple times and the larger vector is returned.
Returns a subvector with the contents of a specific region of the vector.
Shifts all elements in the vector up and writes the given value into the first position (at index = 0) of the vector, where the element in the last position (at index = (N-1), where N is the length of the vector) of the vector is lost.
Reinterprets the current vector as a vector of the given type. The number of elements is automatically computed by the function.
Updates the value of the element on the given index.
Returns the value of the element on the given index.
Returns a constant or non-constant reference object to the element on the given index.
aie::vector<int16,16> wv;
aie::vector<int16,8> vv0,vv1;
wv.insert(0,vv0);//lower half is vv0
wv.insert(1,vv1);//higher half is vv1

wv.push(10);//shift and set wv[0]=10
int16 i0=wv[0];
aie::vector<cint16,8> cv=wv.cast_to<cint16>();//cast wv to complex type
aie::vector<cint16,4> cv0=cv.extract<4>(/*idx=*/1);//extract higher half from cv

aie::vector<int16,32> xv=wv.grow<32>(0);//wv is a vector of size 16, grow function returns a larger vector of size 32 including the content of vector wv in it. "0" means that wv is in the first 16 elements
aie::vector<int16,64> xv2=wv.grow_replicate<64>();//wv which is of size 16 is replicated 4 times and returned.