Matrix Multiply - Matrix Multiply - 2022.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2022.2 English

Following table gives results for the Matrix Multiply function with a wide variety of supported parameters, which are defined in: L2 Matrix Multiply Configuration Parameters.

Note

cycleCountAvg does not include the cycle count information for the additional shuffling/tiling widget kernels, but initiationInterval and PROGRAM_MEMORY do include shuffling/tiling widget kernels.

matrix_mult_benchmark.csv

Table 66 Matrix Multiply benchmark
Library Element T_DATA_A T_DATA_B P_DIM_A P_DIM_AB P_DIM_B P_ADD_TILING_A P_ADD_TILING_B P_ADD_DETILING_OUT P_INPUT_WINDOW_VSIZE_A P_INPUT_WINDOW_VSIZE_B P_CASC_LEN NITER cycleCountAvg throughputAvg initiationInterval throughputInitIntAvg NUM_BANKS NUM_AIE DATA_MEMORY PROGRAM_MEMORY
matrix_mult cfloat cfloat 8 64 4 1 1 1 512 256 4 16 858 2386 Msa/s 1015 2017 Msa/s 32 9 48299 3234 3050 3234 3314 3234 3314 3234 3386 3540
matrix_mult cint16 cint16 8 8 8 1 1 1 64 64 1 16 109 4697 Msa/s 2367 216 Msa/s 12 3 9617 4252 2556 3460
matrix_mult cint16 cint32 8 64 4 1 1 1 512 256 4 16 222 9225 Msa/s 327 6262 Msa/s 31 9 37295 1940 1812 1940 1820 1940 1254 1820 1940 2034
matrix_mult cint16 int16 8 64 4 1 0 1 512 256 4 16 1067 1919 Msa/s 1887 1085 Msa/s 30 7 25127 3096 1904 3938 1722 1722 3954 1706
matrix_mult cint16 int32 8 64 4 1 1 1 512 256 4 16 829 2470 Msa/s 2396 854 Msa/s 29 9 33199 4172 2814 4172 2926 4172 3386 2926 4172 3130
matrix_mult cint32 cint16 8 64 4 1 1 1 512 256 4 16 233 8789 Msa/s 338 6059 Msa/s 32 9 41391 2004 1782 2004 1846 2004 1254 1846 2004 2010
matrix_mult cint32 cint32 8 64 4 1 1 1 512 256 4 16 379 5403 Msa/s 483 4240 Msa/s 33 9 45993 1972 2092 1972 2176 1972 1254 2176 1972 2376
matrix_mult cint32 int16 8 64 4 1 0 1 512 256 4 16 228 8982 Msa/s 313 6543 Msa/s 26 7 34087 1254 2050 2374 1838 1822 2358 1826
matrix_mult cint32 int32 8 64 4 1 1 1 512 256 4 16 233 8789 Msa/s 338 6059 Msa/s 32 9 41391 2004 1782 2004 1846 2004 1254 1846 2004 2010
matrix_mult float cfloat 8 64 4 1 1 1 512 256 4 16 406 5044 Msa/s 510 4015 Msa/s 32 9 39598 2644 2514 2644 2584 2644 2584 2644 1254 2702
matrix_mult float float 8 64 4 1 1 1 512 256 4 16 545 3757 Msa/s 2397 854 Msa/s 34 9 35112 4172 2214 4172 2246 4172 2246 4172 3096 2438
matrix_mult int16 cint16 16 16 16 1 1 1 256 256 1 16 348 11770 Msa/s 1130 3624 Msa/s 13 3 16784 2328 2484 3736
matrix_mult int16 cint32 8 64 4 1 1 1 512 256 4 16 175 11702 Msa/s 281 7288 Msa/s 28 9 33455 2260 1902 2260 1886 2260 1254 1886 2276 2144
matrix_mult int16 int16 16 16 16 1 1 1 256 256 1 16 260 15753 Msa/s 2153 1902 Msa/s 13 3 12686 4114 2096 3484
matrix_mult int32 cint16 8 64 4 1 1 1 512 256 4 16 829 2470 Msa/s 2396 854 Msa/s 29 9 33199 4172 2814 4172 2926 4172 3386 2926 4172 3130
matrix_mult int32 cint32 8 64 4 1 1 1 512 256 4 16 222 9225 Msa/s 327 6262 Msa/s 31 9 37295 1940 1812 1940 1820 1940 1254 1820 1940 2034
matrix_mult cint16 cint16 8 8 8 1 1 1 64 64 1 16 109 4697 Msa/s 2367 216 Msa/s 12 3 9617 4252 2556 3460
matrix_mult cint16 cint16 8 64 4 1 1 1 512 256 4 16 373 5490 Msa/s 2387 857 Msa/s 33 9 32815 4172 1992 4172 2060 4172 2060 4172 3096 2226
matrix_mult cint16 cint16 8 64 4 1 1 1 512 256 1 16 294 6965 Msa/s 2515 814 Msa/s 12 3 19345 4188 2114 3080
matrix_mult cint16 cint16 8 4 64 1 1 1 32 256 1 16 289 7086 Msa/s 1313 1559 Msa/s 13 3 19089 3246 1952 3484
matrix_mult cfloat float 8 64 4 1 1 1 512 256 4 16 316 6481 Msa/s 420 4876 Msa/s 33 9 43694 2084 2334 2084 2334 2084 2334 2084 1254 2558
matrix_mult cint16 cint16 1024 4 4 0 0 0 4096 16 1 16 2334 7019 Msa/s 3996 4100 Msa/s 11 1 67849 1960
matrix_mult cint16 cint16 1024 4 4 1 1 1 4096 16 1 16 2334 7019 Msa/s 4518 3626 Msa/s 16 3 105105 3062 3342 1960
matrix_mult cint16 cint16 16 16 16 0 0 0 256 256 1 16 631 6491 Msa/s 653 6272 Msa/s 7 1 8329 3408
matrix_mult cint16 cint16 16 16 16 1 1 1 256 256 1 16 631 6491 Msa/s 2455 1668 Msa/s 13 3 18833 3736 3408 4604
matrix_mult cint16 cint16 16 256 16 0 0 1 4096 4096 1 16 8315 7881 Msa/s 8184 8007 Msa/s 14 2 73997 3736 3408
matrix_mult cint16 cint16 20 60 4 1 1 1 1200 240 1 16 725 6620 Msa/s 2752 1744 Msa/s 13 3 30865 4204 2220 3342
matrix_mult cint16 cint16 24 4 4 1 1 1 96 16 1 16 85 4517 Msa/s 1224 313 Msa/s 12 3 9105 3062 1944 3342
matrix_mult int32 int16 8 64 4 1 0 1 512 256 4 16 1067 1919 Msa/s 1887 1085 Msa/s 30 7 25127 3096 1904 3938 1722 1722 3954 1706
matrix_mult cint16 cint16 32 32 32 0 0 0 1024 1024 1 16 4334 7560 Msa/s 4182 7835 Msa/s 7 1 26761 5100
matrix_mult cint16 cint16 32 32 32 1 0 0 1024 1024 1 16 4345 7541 Msa/s 4269 7675 Msa/s 9 2 37133 3362 5100
matrix_mult cint16 cint16 32 32 64 0 0 0 1024 2048 1 16 8590 7629 Msa/s 8248 7945 Msa/s 7 1 43145 5100
matrix_mult cint16 cint16 32 64 32 0 0 0 2048 2048 1 16 8430 7774 Msa/s 8097 8093 Msa/s 7 1 43145 5100
matrix_mult cint16 cint16 64 64 64 0 0 0 4096 4096 1 16 33529 7818 Msa/s 31841 8232 Msa/s 13 1 100489 5084
matrix_mult cint16 cint16 8 4 4 1 1 1 32 16 1 16 47 2723 Msa/s 1218 105 Msa/s 12 3 7569 3062 1740 3096
matrix_mult cint16 cint16 8 4 512 1 0 1 32 2048 1 16 2081 7873 Msa/s 3879 4223 Msa/s 13 2 86541 1952 3516
matrix_mult cint16 cint16 8 4 512 1 1 1 32 2048 1 16 2081 7873 Msa/s 3986 4110 Msa/s 16 3 105105 3246 1952 3516
matrix_mult int32 int32 8 64 4 1 1 1 512 256 4 16 373 5490 Msa/s 2387 857 Msa/s 33 9 32815 4172 1992 4172 2060 4172 2060 4172 3096 2226