An example of MAC with int16 `X`

buffer and
int16 `Z`

buffer is as follows. Note that the permute
granularity for `X`

buffer is 32 bits. The `start`

and `step`

parameters are always in terms of data type granularity. Therefore, a value of 2 for
16 bits data will choose 2 * 16 bits away. The `xoffsets`

parameter comes as a pair. The first hex value is an absolute
32 bits offset and picks up 2 x 16 bits values (index, index+1) in the even row. The
second hex value is offset from first value + 1 (32 bits offset) and picks up 2 x 16
bits values in the odd row. So the hex value `0x24`

in `xoffsets`

selects index 8, 9 for even row and
index 14, 15 for odd row from `xbuff`

and the hex
value `0x00`

in `xoffsets`

selects index 0, 1 for even row and index 2, 3 for odd row
from `xbuff`

.

There is another `xsquare`

parameter
to perform 16 bits granularity twiddling after the main permute. For example,
`xsquare`

value `0x2103`

(see from lower hex value to higher hex value) puts index 3, 0
in the even row and index 1, 2 in the odd row. How the `xsquare`

parameter works can be seen in the center of the following
figure.

The following figure is an example of `mac16`

intrinsic of int16 and int16. It is used in the matrix vector multiplication and
matrix multiplication example designs in Single Kernel Coding Examples.