Constraints - 2023.2 English

A Matrix Multiply solution can consist of a cascade of kernels for the multiply operations themselves, but also tiling kernels on each input to each member of that cascade, and a tiling kernel on the output. The tiling kernels’ function is to convert between the arrangement of matrix elements in memory to a form of arrangement optimized for vector multiply, or vice versa. In the entry level graph, the following names are used to identify the various kernels as follows:

‘m_MatmultKernels’ - This is the array of kernel pointers returned by getKernels which point to the cascade TP_CASC_LEN of matrix multiply kernels. These kernels perform the matrix multiply operations.

‘untiler’ - This is a single kernel on on the output of the matrix multiply kernel or cascade of kernels. It performs the transformation from a tiled format to the output format.

‘tilerA’ - This is an array of TP_CASC_LEN kernels which connect 1:1 with the A input port of the matrix multiply kernels.

‘tilerB’ - This is an array of TP_CASC_LEN kernels which connect 1:1 with the B input port of the matrix multiply kernels.

NOTE : For some combinations of template parameters, the log will give out an error message “ERROR: shouldn’t be here”. This combination of factors is not supported by the AIE Compiler. A possible workaround is to pad up the matrices with zeros so that their dimensions become the closest multiple of 8 for cint32 data types, 16 for cint16/int16 data types, and 32 for int16 data types.