There is no size limitations for matrices in the operation as long as they are fitted in the memory.
The matrices are partitioned into multiple identical blocks for block matrix multiplication.
The size of these matrix blocks should be a multiple of the size of the systolic array.