The following example takes two vectors with reals in rva
and imaginary in rvb
(with type v8int32
) and creates a
new complex vector, using the offsets to interleave the values as required.
v8cint32 cv = as_v8cint32(select16(0xaaaa, concat(rva, rvb),
0, 0x03020100, 0x07060504, 8, 0x30201000, 0x70605040));
The following example shows how to extract real and imaginary portion
of a vector cv
with type v8cint32
.
v16int32 re_im = shuffle16(as_v16int32(cv), 0, 0xECA86420, 0xFDB97531);
v8int32 re = ext_w(re_im, 0);
v8int32 im = ext_w(re_im, 1);
Shuffle intrinsic functions can be used to reorder the elements in a vector or set all elements to the same value. Some intrinsic functions operate only on larger registers but it is easy to use them for smaller registers. The following example shows how to implement a function to set all four elements in a vector to a constant value.
v4int32 v2 = ext_v(shuffle16(xset_v(0, v1), 0 ,0, 0), 0);
The following example shows how to multiply each element in rva
by the first element in rvb
. This is efficient for a vector multiplied by constant value.
v8acc80 acc = lmul8(concat(rva,undef_v8int32()),0,0x76543210,rvb,0,0x00);
The following examples show how to multiply each element in rva
by its corresponding element in rvb
.
acc = lmul8(concat(rva, undef_v8int32()),0,0x76543210,rvb,0,0x76543210);
acc = lmul8(upd_w(undef_v16int32(),0,rva),0,0x76543210,rvb,0,0x76543210);
The following examples show how to do matrix multiplication for int8
x int8 data types with mul
intrinsic, assuming that
data storage is row based.
//Z_{2x8} * X_{8x8} = A_{2x8}
mul16(Xbuff, 0, 0x11101110, 16, 0x3120, Zbuff, 0, 0x44440000, 2, 0x3210);
//Z_{4x8} * X_{8x4} = A_{4x4}
mul16(Xbuff, 0, 0x00000000, 8, 0x3120, Zbuff, 0, 0xCC884400, 2, 0x3210);
If the kernel has multiple mul
or
mac
intrinsics, try to keep the xoffsets
and zoffsets
parameters constant across uses and vary the xtsart
and zstart
parameters. This will help prevent
configuration register spills on stack.
For more information about vector lane permutations, refer to the Versal ACAP AI Engine Intrinsics Documentation (UG1078).