An example of MAC with int16 X
buffer and int16 Z buffer is as follows. Note that
the permute granularity for X buffer is 32 bits.
The start and step
parameters are always in terms of data type granularity. To get to a 16-bit index,
you need to multiply them by 2.
The xoffsets parameter comes as a
pair. The first hex value is an absolute 32 bits offset and picks up 2 x 16 bits
values (index, index+1) in the even row. The second hex value is offset from first
value + 1 (32 bits offset) and picks up 2 x 16 bits values in the odd row. So the
hex value 0x24 in xoffsets selects index 8, 9 for even row and index 14, 15 for odd row
from xbuff:
even: 2 * 4 -> get indices [8, 9]
odd: 2 * ( 2 + 4 + 1 ) -> get indices [14, 15]
Similarly, the hex value 0x00 in
xoffsets selects index 0, 1 for even row and
index 2, 3 for odd row from xbuff.
There is another xsquare parameter
to perform 16 bits granularity twiddling after the main permute. It will give
additional contribution to the index in a 2 by 2 matrix recurring across the 8x4
matrix compute given by MUL8 in int16 x int16 mode.
For example, xsquare value 0x2103 (see from lower hex value to higher hex value)
puts index 3, 0 in the even row and index 1, 2 in the odd row. How the xsquare parameter works can be seen in the center of
the following figure.
The following figure is an example of mac16 intrinsic of int16 and int16.