In this design, the multiplication of 2 square matrices (MatA and MatB) is done using a 24-AIE core overlay. MatA is divided into 3 x 8 blocks and MatB into 8 x 3 blocks. MatA input is provided 1x8 block at a time, using 8 input streams, and MatB is provided using 24 input streams for each 8x3 blocks. Output Matrix MatC is divided into 3x3 blocks and is given out as 1x3block at a time using 3 output streams. 24 core overlay is chosen to keep the core overlay same across all Matrx Dimensions, 32x32x32-64x64x64 onwards to 1024x1024x1024 and keep the performance high.