MAC on 16x16 bits - 2021.1 English

AI Engine Kernel Coding Best Practices Guide (UG1079)

Document ID
UG1079
Release Date
2021-07-19
Version
2021.1 English

An example of MAC with int16 X buffer and int16 Z buffer is as follows. Note that the permute granularity for X buffer is 32 bits. The start and step parameters are always in terms of data type granularity. Therefore, a value of 2 for 16 bits data will choose 2 * 16 bits away. The xoffsets parameter comes as a pair. The first hex value is an absolute 32 bits offset and picks up 2 x 16 bits values (index, index+1) in the even row. The second hex value is offset from first value + 1 (32 bits offset) and picks up 2 x 16 bits values in the odd row. So the hex value 0x24 in xoffsets selects index 8, 9 for even row and index 14, 15 for odd row from xbuff and the hex value 0x00 in xoffsets selects index 0, 1 for even row and index 2, 3 for odd row from xbuff.

There is another xsquare parameter to perform 16 bits granularity twiddling after the main permute. For example, xsquare value 0x2103 (see from lower hex value to higher hex value) puts index 3, 0 in the even row and index 1, 2 in the odd row. How the xsquare parameter works can be seen in the center of the following figure.

Figure 1. MAC8 on int16 x int16 Type

The following figure is an example of mac16 intrinsic of int16 and int16. It is used in the matrix vector multiplication and matrix multiplication example designs in Single Kernel Coding Examples.

Figure 2. MAC16 on int16 x int16 Type