Matrix Multiply - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

Following table gives results for the Matrix Multiply function with a wide variety of supported parameters, which are defined in: Matrix Multiply Configuration Parameters.

matrix_mult_benchmark.csv

Table 75 Matrix Multiply benchmark
Library Element AIE_VARIANT T_DATA_A T_DATA_B P_DIM_A P_DIM_AB P_DIM_B P_ADD_TILING_A P_ADD_TILING_B P_ADD_DETILING_OUT P_INPUT_WINDOW_VSIZE_A P_INPUT_WINDOW_VSIZE_B P_CASC_LEN NITER Latency Throughput NUM_BANKS NUM_AIE DATA_MEMORY PROGRAM_MEMORY
matrix_mult 1 cfloat cfloat 8 64 4 1 1 1 512 256 4 12 4381 ns 43 MSa/s 29 9 48308 3020 3060 3020 3316 3020 1360 3316 3020 3388
matrix_mult 1 cfloat float 8 64 4 1 1 1 512 256 4 12 1599 ns 125 MSa/s 28 9 43703 1952 2282 1952 2298 1952 2298 1952 1264 2398
matrix_mult 1 cint16 cint16 16 16 16 0 0 0 256 256 1 8 1617 ns 425 MSa/s 7 1 8329 3260
matrix_mult 1 cint16 cint16 16 16 16 1 1 1 256 256 1 8 8188 ns 118 MSa/s 13 3 18836 1776 3244 4460
matrix_mult 1 cint16 cint16 16 256 16 0 0 1 4096 4096 1 8 17934 ns 35 MSa/s 14 2 73998 1776 3276
matrix_mult 1 cint16 cint16 20 60 4 1 1 1 1200 240 1 8 7963 ns 33 MSa/s 13 3 30868 4108 2096 1328
matrix_mult 1 cint16 cint16 24 4 4 1 1 1 96 16 1 8 3241 ns 90 MSa/s 12 3 9107 3086 1864 1344
matrix_mult 1 cint16 cint16 32 32 32 0 0 0 1024 1024 1 8 10526 ns 267 MSa/s 7 1 26761 4994
matrix_mult 1 cint16 cint16 32 32 32 1 0 0 1024 1024 1 8 17194 ns 267 MSa/s 8 2 37134 3432 4994
matrix_mult 1 cint16 cint16 32 32 64 0 0 0 1024 2048 1 8 21634 ns 271 MSa/s 7 1 43145 4994
matrix_mult 1 cint16 cint16 32 64 32 0 0 0 2048 2048 1 8 20191 ns 138 MSa/s 7 1 43145 4994
matrix_mult 1 cint16 cint16 512 4 4 1 1 1 2048 16 1 8 4640 ns 1000 MSa/s 13 3 55955 3086 1328 1880
matrix_mult 1 cint16 cint16 64 64 64 0 0 0 4096 4096 1 8 83656 ns 140 MSa/s 13 1 100489 4962
matrix_mult 1 cint16 cint16 8 4 4 1 1 1 32 16 1 8 4276 ns 30 MSa/s 12 3 7571 3086 1640 1296
matrix_mult 1 cint16 cint16 8 4 512 1 0 1 32 2048 1 12 25906 ns 1000 MSa/s 13 2 86542 1862 1494
matrix_mult 1 cint16 cint16 8 4 512 1 0 1 32 2048 1 8 17341 ns 1066 MSa/s 13 2 86542 1862 1494
matrix_mult 1 cint16 cint16 8 4 512 1 1 1 32 2048 1 16 25912 ns 1000 MSa/s 17 3 105107 3246 1862 1494
matrix_mult 1 cint16 cint16 8 4 64 1 1 1 32 256 1 8 4631 ns 460 MSa/s 13 3 19091 3246 1862 1478
matrix_mult 1 cint16 cint16 8 64 4 1 1 1 512 256 1 8 7708 ns 14 MSa/s 12 3 19348 3916 2032 1280
matrix_mult 1 cint16 cint16 8 64 4 1 1 1 512 256 4 12 7686 ns 15 MSa/s 33 9 32824 3916 2002 3916 2086 3916 2086 3916 1296 2112
matrix_mult 1 cint16 cint16 8 8 8 1 1 1 64 64 1 8 7690 ns 30 MSa/s 12 3 9620 4012 2382 1408
matrix_mult 1 cint16 cint16 8 8 8 1 1 1 64 64 1 8 7690 ns 30 MSa/s 12 3 9620 4012 2382 1408
matrix_mult 1 cint16 cint32 8 64 4 1 1 1 512 256 4 12 1252 ns 135 MSa/s 30 9 37304 1680 1790 1680 1818 1680 1818 1680 1248 1920
matrix_mult 1 cint16 int16 8 64 4 1 0 1 512 256 4 12 5851 ns 19 MSa/s 30 7 25132 1296 1828 3704 1732 1748 3720 1716
matrix_mult 1 cint16 int32 8 64 4 1 1 1 512 256 4 12 8112 ns 15 MSa/s 29 9 33208 3916 2824 3916 2952 3916 1360 2952 3916 3030
matrix_mult 1 cint32 cint16 8 64 4 1 1 1 512 256 4 12 878 ns 125 MSa/s 32 9 41400 1824 1724 1824 1816 1824 1816 1824 1264 1930
matrix_mult 1 cint32 cint32 8 64 4 1 1 1 512 256 4 12 2076 ns 86 MSa/s 27 9 46002 1840 2050 1840 2146 1840 2146 1840 1264 2240
matrix_mult 1 cint32 int16 8 64 4 1 0 1 512 256 4 12 750 ns 125 MSa/s 26 7 34092 1264 1922 2224 1820 1836 2224 1744
matrix_mult 1 cint32 int32 8 64 4 1 1 1 512 256 4 12 878 ns 125 MSa/s 32 9 41400 1824 1724 1824 1816 1824 1816 1824 1264 1930
matrix_mult 1 float cfloat 8 64 4 1 1 1 512 256 4 12 2287 ns 81 MSa/s 28 9 39607 2400 2388 2400 2530 2400 2530 2400 1248 2650
matrix_mult 1 float float 8 64 4 1 1 1 512 256 4 12 7841 ns 15 MSa/s 34 9 35121 3916 2192 3916 2224 3916 2224 3916 1296 2300
matrix_mult 1 int16 cint16 16 16 16 1 1 1 256 256 1 8 1680 ns 731 MSa/s 13 3 16787 2110 2370 1776
matrix_mult 1 int16 cint16 8 64 4 1 1 1 512 256 4 12 705 ns 244 MSa/s 26 9 28471 1758 1646 1758 1694 1758 1694 1758 1296 1774
matrix_mult 1 int16 cint32 8 64 4 1 1 1 512 256 4 12 1119 ns 167 MSa/s 28 9 33464 2030 1868 2030 1884 2030 1248 1884 2030 2002
matrix_mult 1 int16 int16 16 16 16 1 1 1 256 256 1 8 6991 ns 134 MSa/s 13 3 12689 3880 1918 1424
matrix_mult 1 int16 int16 16 8 8 1 0 0 128 64 2 8 427 ns 1267 MSa/s 11 2 5643 1376 1832
matrix_mult 1 int16 int16 8 64 4 1 0 1 512 256 4 12 6464 ns 17 MSa/s 24 6 18529 1408 3880 1408 1408 3880 1536
matrix_mult 1 int16 int32 8 64 4 1 1 1 512 256 4 12 705 ns 244 MSa/s 26 9 28471 1758 1646 1758 1694 1758 1694 1758 1296 1774
matrix_mult 1 int32 cint16 8 64 4 1 1 1 512 256 4 12 8112 ns 15 MSa/s 29 9 33208 3916 2824 3916 2952 3916 1360 2952 3916 3030
matrix_mult 1 int32 cint32 8 64 4 1 1 1 512 256 4 12 1252 ns 135 MSa/s 30 9 37304 1680 1790 1680 1818 1680 1818 1680 1248 1920
matrix_mult 1 int32 int16 8 64 4 1 0 1 512 256 4 12 5851 ns 19 MSa/s 30 7 25132 1296 1828 3704 1732 1748 3720 1716
matrix_mult 1 int32 int32 8 64 4 1 1 1 512 256 4 12 7686 ns 15 MSa/s 33 9 32824 3916 2002 3916 2086 3916 2086 3916 1296 2112
matrix_mult 2 cint16 cint16 1024 4 8 0 0 0 4096 32 1 8 25174 ns 1000 MSa/s 15 1 100750 1456
matrix_mult 2 cint16 cint16 1024 4 8 1 1 1 4096 32 1 8 25173 ns 1000 MSa/s 16 2 103219 992 1456
matrix_mult 2 cint16 cint16 16 16 16 0 0 0 256 256 1 8 878 ns 723 MSa/s 6 1 8334 2112
matrix_mult 2 cint16 cint16 16 16 16 1 1 1 256 256 1 8 1727 ns 625 MSa/s 8 3 18873 1008 2112 1856
matrix_mult 2 cint16 cint16 16 256 16 0 0 1 4096 4096 1 8 7953 ns 62 MSa/s 12 2 74003 2128 1008
matrix_mult 2 cint16 cint16 20 60 8 0 1 1 1200 480 1 8 2387 ns 133 MSa/s 9 2 22963 1952 1392
matrix_mult 2 cint16 cint16 24 4 8 1 1 1 96 32 1 8 654 ns 1000 MSa/s 9 2 7219 1440 992
matrix_mult 2 cint16 cint16 32 32 32 0 0 0 1024 1024 1 8 6581 ns 407 MSa/s 6 1 26766 2208
matrix_mult 2 cint16 cint16 32 32 32 1 0 0 1024 1024 1 8 10817 ns 407 MSa/s 9 2 37139 2208 1088
matrix_mult 2 cint16 cint16 32 64 32 0 0 0 2048 2048 1 8 10902 ns 238 MSa/s 7 1 43150 2224
matrix_mult 2 cint16 cint16 4 64 8 1 1 1 256 512 4 8 816 ns 191 MSa/s 29 8 30773 1744 1744 1536 1504 1744 1536 1744 1632
matrix_mult 2 cint16 cint16 8 4 512 1 1 1 32 2048 1 8 16932 ns 939 MSa/s 16 3 105144 1888 1152 1520
matrix_mult 2 cint16 cint16 8 4 64 1 1 1 32 256 1 8 2228 ns 901 MSa/s 12 3 19128 1872 1136 1520
matrix_mult 2 cint16 cint16 8 4 8 1 1 1 32 32 1 8 350 ns 761 MSa/s 9 2 5683 1440 992
matrix_mult 2 cint16 cint16 8 64 8 1 1 1 512 512 1 8 2262 ns 125 MSa/s 7 2 21300 2128 1936
matrix_mult 2 cint16 cint16 8 8 8 1 1 1 64 64 1 8 606 ns 457 MSa/s 7 2 6964 1792 1504
matrix_mult 2 cint16 cint16 8 8 8 1 1 1 64 64 1 8 606 ns 457 MSa/s 7 2 6964 1792 1504
matrix_mult 2 cint16 int16 8 64 4 1 1 1 512 256 4 8 827 ns 170 MSa/s 24 6 22697 1648 1488 1632 1696 1632 1648
matrix_mult 2 int16 int16 16 16 16 1 1 1 256 256 1 8 1330 ns 876 MSa/s 8 3 12697 1424 1504 1536
matrix_mult 2 int16 int16 16 8 8 1 0 0 128 64 2 8 313 ns 1802 MSa/s 11 2 5651 1024 1424
matrix_mult 2 int16 int32 8 64 4 0 1 1 512 256 4 8 481 ns 320 MSa/s 25 6 19753 1120 1360 1408 1504 1408 1360
matrix_mult 2 int16 int32 8 64 4 1 1 1 512 256 4 8 611 ns 242 MSa/s 31 8 26293 1456 1456 1408 1120 1456 1408 1456 1504
matrix_mult 2 int32 int16 8 64 4 1 1 1 512 256 4 8 742 ns 166 MSa/s 24 6 22697 1648 1136 1440 1520 1440 1648