There is no size limitations for matrices in the operation as long as they are
fitted in the memory.
The matrices are partitioned into multiple identical blocks for block matrix
multiplication.
The size of these matrix blocks should be multiple of the size of systolic array.