The maximum memory accessible by an AIE kernel is 32 kB x 4 for AIE. The maximum matrix dimensions per kernel is limited by the memory requirements and how much memory is available. A matrix_mult design needs to allocate memory for the following:
- Window Size A: Input matrix A of size
(TP_DIM_A / TP_SSR) x (TP_DIM_AB / TP_CASC_LEN) x sizeof(TT_DATA_A)
. - Window Size B: Input matrix B of size
TP_DIM_B x (TP_DIM_AB / TP_CASC_LEN) x sizeof(TT_DATA_B)
. - Window Size Out: Output matrix of size
(TP_DIM_A / TP_SSR) x TP_DIM_B x sizeof(TT_DATA_OUT)
.
Optionally, depending on whether you use the tiling/detiling feature of the element, you need:
- If Matrix A needs to be tiled: Add memory of Window Size A.
- If Matrix B needs to be tiled: Add memory of Window Size B.
- If Output matrix needs to be detiled: Add memory of Window Size Out.
Further, if these buffers are ping-pong buffers, their memory requirement doubles in size. You can reduce this factor by using the single_buffer constraint on the buffer. Apart from these, the program also needs some system memory to run which has been empirically observed to occupy around 2.5 kB.
If the memory requirements are too large for a single kernel, increase the value of TP_CASC_LEN
to split the dimension TP_DIM_AB
over multiple kernels, or the value of TP_SSR
to split the TP_DIM_A
dimension of Matrix A.