Update, Extract, and Shift

Update, Extract, and Shift - 2022.1 English

AI Engine Kernel Coding Best Practices Guide (UG1079)

Document ID

UG1079

Release Date

2022-05-25

Version

2022.1 English

To update portions of vector registers, the upd_v(), upd_w(), and upd_x() intrinsic functions are provided for 128-bit (v), 256-bit (w), and 512-bit (x) updates.

Note: The updates overwrite a portion of the larger vector with the new data while keeping the other part of the vector alive. This alive state of the larger vector persists through multiple updates. If too many vectors are kept unnecessarily alive, register spillage can occur and impact performance.

Similarly, ext_v(), ext_w(), and ext_x() intrinsic functions are provided to extract portions of the vector.

To update or extract individual elements, the upd_elem() and ext_elem() intrinsic functions are provided. These must be used when loading or storing values that are not in contiguous memory locations and require multiple clock cycles to load or store a vector. In the following example, the 0th element of vector v1 is updated with the value of a - which is 100.

int a = 100;
v4int32 v1 = upd_elem(undef_v4int32(), 0, a);

Another important use is to move data to the scalar unit and do an inverse or sqrt. In the following example, the 0th element of vector vf is extracted and stored in the scalar variable f.

v4float vf;
float f=ext_elem(vf,0);
float i_f=invsqrt(f);

The shft_elem() intrinsic function can be used to update a vector by inserting a new element at the beginning of a vector and shifting the other elements by one.

Accumulator registers can be updated from vector registers by ups intrinsic function. And accumulator registers can also be half updated by upd_hi and upd_lo intrinsic functions.

//From v16int32 to v16acc48
v16int32 v;
v16acc48 acc = upd_lo(acc, ups(ext_w(v, 0), 0)); 
acc = upd_hi(acc, ups(ext_w(v, 1), 0));