The following figures show MAC with int8 X
buffer and int8 Z
buffer. The first
figure shows how data is permuted and the second figure shows how coefficients are
permuted. Note that the permute granularity for X
buffer and Z
buffer are 32 bits and 16 bits,
respectively. The xoffsets
parameter comes in pair.
The first hex value is an absolute 32 bits offset and pick up 4 x 8 bits values
(index, index+1, index+2, index+3). The second hex value is offset from the first
value + 1 (32 bits offset) and picks up 4 x 8 bits values. For example, 0x00
selects index 0, 1, 2, 3 as well as 4, 5, 6, 7,
and 0x24
selects index 16, 17, 18, 19 as well as
28, 29, 30, 31.
There is another xsquare
parameter
to do 8 bits granularity twiddling after main permute. How xsquare
parameter works in this example can be seen in the center of
the following figure.
The start
(xstart
, zstart
) and step
(xstep
, zstep
) parameters are always in terms of data type
granularity. Hence, a value of 2 for 16 bits is 2 * 16 bits away, while a value of 2
for 8 bits is 2 * 8 bits away. The step
parameter
applies to the next block of selected data. So, if a pair of offset
parameters select a 2 * 2 block, the step applies to the next 2
* 2 block. The step added to the index value must be aligned to the permute
granularity (32 bits for data, 16 bits for coefficient). For example, when working
with 8-bit data, xstep
needs to be multiples of
four. When working with 8-bit coefficient, zstep
needs to be multiples of two. The following two figures show how step
works for data and coefficients.
Note that for the coefficient in int8 * int8 types, the 2 * 2 index block is duplicated to construct a 4 * 2 block. See how index 0, 1, 2, and 3 are duplicated in Figure 2.