The maximum memory accessible by a kernel is 32 kB for AIE and 64kB for AIE-ML. The maximum matrix dimensions per kernel are limited by the memory requirements and how much memory is available.
A matrix_vector_mul design needs to allocate memory for the following:
- iobuffer Size A: Input matrix A of size
(TP_DIM_A / TP_SSR) x (TP_DIM_B / TP_CASC_LEN) x sizeof(TT_DATA_A)
. - iobuffer Size B: Input vector B of size
(TP_DIM_B / TP_CASC_LEN) x sizeof(TT_DATA_B)
. - iobuffer Size Out: Output vector of size
(TP_DIM_A / TP_SSR) x sizeof(TT_DATA_OUT)
.
Furthermore, if these buffers are ping-pong buffers, their memory requirement doubles in size. This can be reduced by using the single_buffer constraint on the buffer.
The cascading and SSR feature of the Matrix-Vector Multiply can be used if the size of the matrix and vector exceeds the maximum memory of a single kernel. This works as the matrix and vector data will be split across multiple kernels resulting in a reduced per-kernel memory usage.