A Matrix-Multiplication solution can consist of a TP_SSR
number of cascaded kernel paths (each containing TP_CASC_LEN
kernels). There is also the option to add tiler and detiler kernels to the solution. Tiler kernels can be added for each input port for each kernel, and detiler kernels can be added on the final output of each SSR path.
The tiling kernels’ function is to convert between the arrangement of matrix elements in memory to a form of arrangement optimized for vector multiply, or vice versa. In the entry level graph, the following names are used to identify the various kernels as follows:
- ‘m_MatmultKernels’ - This is the array of kernel pointers returned by getKernels which point to the cascade
TP_CASC_LEN
of matrix multiply kernels. These kernels perform the matrix multiply operations. - ‘untiler’ - This is an array of
TP_SSR
kernels on the output of the each Matrix Multiply SSR path. It performs the transformation from a tiled format to the true output format. - ‘tilerA’ - This is an array of
TP_CASC_LEN * TP_SSR
kernels which connect 1:1 with the A input port of the matrix multiply kernels. - ‘tilerB’ - This is an array of
TP_CASC_LEN * TP_SSR
kernels which connect 1:1 with the B input port of the matrix multiply kernels.
Note
For some combinations of the template parameters, the log will give out an error message “ERROR: shouldn’t be here”. This combination of factors is not supported by the AIE Compiler. A possible workaround is to pad up the matrices with zeroes so that their dimensions become the closest multiple of 8 for cint32 data types, 16 for cint16/int16 data types, and 32 for int16 data types.