MAC on 16x16 bits - 2022.2 English

AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID
Release Date
2022.2 English

An example of MAC with int16 X buffer and int16 Z buffer is as follows. Note that the permute granularity for X buffer is 32 bits. The start and step parameters are always in terms of data type granularity. To get to a 16-bit index, you need to multiply them by 2.

The xoffsets parameter comes as a pair. The first hex value is an absolute 32 bits offset and picks up 2 x 16 bits values (index, index+1) in the even row. The second hex value is offset from first value + 1 (32 bits offset) and picks up 2 x 16 bits values in the odd row. So the hex value 0x24 in xoffsets selects index 8, 9 for even row and index 14, 15 for odd row from xbuff:

even: 2 * 4 -> get indices [8, 9]
odd: 2 * ( 2 + 4 + 1 ) -> get indices [14, 15]

Similarly, the hex value 0x00 in xoffsets selects index 0, 1 for even row and index 2, 3 for odd row from xbuff.

There is another xsquare parameter to perform 16 bits granularity twiddling after the main permute. It will give additional contribution to the index in a 2 by 2 matrix recurring across the 8x4 matrix compute given by MUL8 in int16 x int16 mode.

For example, xsquare value 0x2103 (see from lower hex value to higher hex value) puts index 3, 0 in the even row and index 1, 2 in the odd row. How the xsquare parameter works can be seen in the center of the following figure.

Figure 1. MAC8 on int16 x int16 Type

The following figure is an example of mac16 intrinsic of int16 and int16.

Figure 2. MAC16 on int16 x int16 Type