Following table gives results for the Matrix Multiply function with a wide variety of supported parameters, which are defined in: Matrix Multiply Configuration Parameters.
Library Element | AIE_VARIANT | T_DATA_A | T_DATA_B | P_DIM_A | P_DIM_AB | P_DIM_B | P_ADD_TILING_A | P_ADD_TILING_B | P_ADD_DETILING_OUT | P_INPUT_WINDOW_VSIZE_A | P_INPUT_WINDOW_VSIZE_B | P_CASC_LEN | NITER | Latency | Throughput | NUM_BANKS | NUM_AIE | DATA_MEMORY | PROGRAM_MEMORY |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
matrix_mult | 1 | cfloat | cfloat | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 4381 ns | 43 MSa/s | 29 | 9 | 48308 | 3020 3060 3020 3316 3020 1360 3316 3020 3388 |
matrix_mult | 1 | cfloat | float | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 1599 ns | 125 MSa/s | 28 | 9 | 43703 | 1952 2282 1952 2298 1952 2298 1952 1264 2398 |
matrix_mult | 1 | cint16 | cint16 | 16 | 16 | 16 | 0 | 0 | 0 | 256 | 256 | 1 | 8 | 1617 ns | 425 MSa/s | 7 | 1 | 8329 | 3260 |
matrix_mult | 1 | cint16 | cint16 | 16 | 16 | 16 | 1 | 1 | 1 | 256 | 256 | 1 | 8 | 8188 ns | 118 MSa/s | 13 | 3 | 18836 | 1776 3244 4460 |
matrix_mult | 1 | cint16 | cint16 | 16 | 256 | 16 | 0 | 0 | 1 | 4096 | 4096 | 1 | 8 | 17934 ns | 35 MSa/s | 14 | 2 | 73998 | 1776 3276 |
matrix_mult | 1 | cint16 | cint16 | 20 | 60 | 4 | 1 | 1 | 1 | 1200 | 240 | 1 | 8 | 7963 ns | 33 MSa/s | 13 | 3 | 30868 | 4108 2096 1328 |
matrix_mult | 1 | cint16 | cint16 | 24 | 4 | 4 | 1 | 1 | 1 | 96 | 16 | 1 | 8 | 3241 ns | 90 MSa/s | 12 | 3 | 9107 | 3086 1864 1344 |
matrix_mult | 1 | cint16 | cint16 | 32 | 32 | 32 | 0 | 0 | 0 | 1024 | 1024 | 1 | 8 | 10526 ns | 267 MSa/s | 7 | 1 | 26761 | 4994 |
matrix_mult | 1 | cint16 | cint16 | 32 | 32 | 32 | 1 | 0 | 0 | 1024 | 1024 | 1 | 8 | 17194 ns | 267 MSa/s | 8 | 2 | 37134 | 3432 4994 |
matrix_mult | 1 | cint16 | cint16 | 32 | 32 | 64 | 0 | 0 | 0 | 1024 | 2048 | 1 | 8 | 21634 ns | 271 MSa/s | 7 | 1 | 43145 | 4994 |
matrix_mult | 1 | cint16 | cint16 | 32 | 64 | 32 | 0 | 0 | 0 | 2048 | 2048 | 1 | 8 | 20191 ns | 138 MSa/s | 7 | 1 | 43145 | 4994 |
matrix_mult | 1 | cint16 | cint16 | 512 | 4 | 4 | 1 | 1 | 1 | 2048 | 16 | 1 | 8 | 4640 ns | 1000 MSa/s | 13 | 3 | 55955 | 3086 1328 1880 |
matrix_mult | 1 | cint16 | cint16 | 64 | 64 | 64 | 0 | 0 | 0 | 4096 | 4096 | 1 | 8 | 83656 ns | 140 MSa/s | 13 | 1 | 100489 | 4962 |
matrix_mult | 1 | cint16 | cint16 | 8 | 4 | 4 | 1 | 1 | 1 | 32 | 16 | 1 | 8 | 4276 ns | 30 MSa/s | 12 | 3 | 7571 | 3086 1640 1296 |
matrix_mult | 1 | cint16 | cint16 | 8 | 4 | 512 | 1 | 0 | 1 | 32 | 2048 | 1 | 12 | 25906 ns | 1000 MSa/s | 13 | 2 | 86542 | 1862 1494 |
matrix_mult | 1 | cint16 | cint16 | 8 | 4 | 512 | 1 | 0 | 1 | 32 | 2048 | 1 | 8 | 17341 ns | 1066 MSa/s | 13 | 2 | 86542 | 1862 1494 |
matrix_mult | 1 | cint16 | cint16 | 8 | 4 | 512 | 1 | 1 | 1 | 32 | 2048 | 1 | 16 | 25912 ns | 1000 MSa/s | 17 | 3 | 105107 | 3246 1862 1494 |
matrix_mult | 1 | cint16 | cint16 | 8 | 4 | 64 | 1 | 1 | 1 | 32 | 256 | 1 | 8 | 4631 ns | 460 MSa/s | 13 | 3 | 19091 | 3246 1862 1478 |
matrix_mult | 1 | cint16 | cint16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 1 | 8 | 7708 ns | 14 MSa/s | 12 | 3 | 19348 | 3916 2032 1280 |
matrix_mult | 1 | cint16 | cint16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 7686 ns | 15 MSa/s | 33 | 9 | 32824 | 3916 2002 3916 2086 3916 2086 3916 1296 2112 |
matrix_mult | 1 | cint16 | cint16 | 8 | 8 | 8 | 1 | 1 | 1 | 64 | 64 | 1 | 8 | 7690 ns | 30 MSa/s | 12 | 3 | 9620 | 4012 2382 1408 |
matrix_mult | 1 | cint16 | cint16 | 8 | 8 | 8 | 1 | 1 | 1 | 64 | 64 | 1 | 8 | 7690 ns | 30 MSa/s | 12 | 3 | 9620 | 4012 2382 1408 |
matrix_mult | 1 | cint16 | cint32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 1252 ns | 135 MSa/s | 30 | 9 | 37304 | 1680 1790 1680 1818 1680 1818 1680 1248 1920 |
matrix_mult | 1 | cint16 | int16 | 8 | 64 | 4 | 1 | 0 | 1 | 512 | 256 | 4 | 12 | 5851 ns | 19 MSa/s | 30 | 7 | 25132 | 1296 1828 3704 1732 1748 3720 1716 |
matrix_mult | 1 | cint16 | int32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 8112 ns | 15 MSa/s | 29 | 9 | 33208 | 3916 2824 3916 2952 3916 1360 2952 3916 3030 |
matrix_mult | 1 | cint32 | cint16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 878 ns | 125 MSa/s | 32 | 9 | 41400 | 1824 1724 1824 1816 1824 1816 1824 1264 1930 |
matrix_mult | 1 | cint32 | cint32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 2076 ns | 86 MSa/s | 27 | 9 | 46002 | 1840 2050 1840 2146 1840 2146 1840 1264 2240 |
matrix_mult | 1 | cint32 | int16 | 8 | 64 | 4 | 1 | 0 | 1 | 512 | 256 | 4 | 12 | 750 ns | 125 MSa/s | 26 | 7 | 34092 | 1264 1922 2224 1820 1836 2224 1744 |
matrix_mult | 1 | cint32 | int32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 878 ns | 125 MSa/s | 32 | 9 | 41400 | 1824 1724 1824 1816 1824 1816 1824 1264 1930 |
matrix_mult | 1 | float | cfloat | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 2287 ns | 81 MSa/s | 28 | 9 | 39607 | 2400 2388 2400 2530 2400 2530 2400 1248 2650 |
matrix_mult | 1 | float | float | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 7841 ns | 15 MSa/s | 34 | 9 | 35121 | 3916 2192 3916 2224 3916 2224 3916 1296 2300 |
matrix_mult | 1 | int16 | cint16 | 16 | 16 | 16 | 1 | 1 | 1 | 256 | 256 | 1 | 8 | 1680 ns | 731 MSa/s | 13 | 3 | 16787 | 2110 2370 1776 |
matrix_mult | 1 | int16 | cint16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 705 ns | 244 MSa/s | 26 | 9 | 28471 | 1758 1646 1758 1694 1758 1694 1758 1296 1774 |
matrix_mult | 1 | int16 | cint32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 1119 ns | 167 MSa/s | 28 | 9 | 33464 | 2030 1868 2030 1884 2030 1248 1884 2030 2002 |
matrix_mult | 1 | int16 | int16 | 16 | 16 | 16 | 1 | 1 | 1 | 256 | 256 | 1 | 8 | 6991 ns | 134 MSa/s | 13 | 3 | 12689 | 3880 1918 1424 |
matrix_mult | 1 | int16 | int16 | 16 | 8 | 8 | 1 | 0 | 0 | 128 | 64 | 2 | 8 | 427 ns | 1267 MSa/s | 11 | 2 | 5643 | 1376 1832 |
matrix_mult | 1 | int16 | int16 | 8 | 64 | 4 | 1 | 0 | 1 | 512 | 256 | 4 | 12 | 6464 ns | 17 MSa/s | 24 | 6 | 18529 | 1408 3880 1408 1408 3880 1536 |
matrix_mult | 1 | int16 | int32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 705 ns | 244 MSa/s | 26 | 9 | 28471 | 1758 1646 1758 1694 1758 1694 1758 1296 1774 |
matrix_mult | 1 | int32 | cint16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 8112 ns | 15 MSa/s | 29 | 9 | 33208 | 3916 2824 3916 2952 3916 1360 2952 3916 3030 |
matrix_mult | 1 | int32 | cint32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 1252 ns | 135 MSa/s | 30 | 9 | 37304 | 1680 1790 1680 1818 1680 1818 1680 1248 1920 |
matrix_mult | 1 | int32 | int16 | 8 | 64 | 4 | 1 | 0 | 1 | 512 | 256 | 4 | 12 | 5851 ns | 19 MSa/s | 30 | 7 | 25132 | 1296 1828 3704 1732 1748 3720 1716 |
matrix_mult | 1 | int32 | int32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 7686 ns | 15 MSa/s | 33 | 9 | 32824 | 3916 2002 3916 2086 3916 2086 3916 1296 2112 |
matrix_mult | 2 | cint16 | cint16 | 1024 | 4 | 8 | 0 | 0 | 0 | 4096 | 32 | 1 | 8 | 25174 ns | 1000 MSa/s | 15 | 1 | 100750 | 1456 |
matrix_mult | 2 | cint16 | cint16 | 1024 | 4 | 8 | 1 | 1 | 1 | 4096 | 32 | 1 | 8 | 25173 ns | 1000 MSa/s | 16 | 2 | 103219 | 992 1456 |
matrix_mult | 2 | cint16 | cint16 | 16 | 16 | 16 | 0 | 0 | 0 | 256 | 256 | 1 | 8 | 878 ns | 723 MSa/s | 6 | 1 | 8334 | 2112 |
matrix_mult | 2 | cint16 | cint16 | 16 | 16 | 16 | 1 | 1 | 1 | 256 | 256 | 1 | 8 | 1727 ns | 625 MSa/s | 8 | 3 | 18873 | 1008 2112 1856 |
matrix_mult | 2 | cint16 | cint16 | 16 | 256 | 16 | 0 | 0 | 1 | 4096 | 4096 | 1 | 8 | 7953 ns | 62 MSa/s | 12 | 2 | 74003 | 2128 1008 |
matrix_mult | 2 | cint16 | cint16 | 20 | 60 | 8 | 0 | 1 | 1 | 1200 | 480 | 1 | 8 | 2387 ns | 133 MSa/s | 9 | 2 | 22963 | 1952 1392 |
matrix_mult | 2 | cint16 | cint16 | 24 | 4 | 8 | 1 | 1 | 1 | 96 | 32 | 1 | 8 | 654 ns | 1000 MSa/s | 9 | 2 | 7219 | 1440 992 |
matrix_mult | 2 | cint16 | cint16 | 32 | 32 | 32 | 0 | 0 | 0 | 1024 | 1024 | 1 | 8 | 6581 ns | 407 MSa/s | 6 | 1 | 26766 | 2208 |
matrix_mult | 2 | cint16 | cint16 | 32 | 32 | 32 | 1 | 0 | 0 | 1024 | 1024 | 1 | 8 | 10817 ns | 407 MSa/s | 9 | 2 | 37139 | 2208 1088 |
matrix_mult | 2 | cint16 | cint16 | 32 | 64 | 32 | 0 | 0 | 0 | 2048 | 2048 | 1 | 8 | 10902 ns | 238 MSa/s | 7 | 1 | 43150 | 2224 |
matrix_mult | 2 | cint16 | cint16 | 4 | 64 | 8 | 1 | 1 | 1 | 256 | 512 | 4 | 8 | 816 ns | 191 MSa/s | 29 | 8 | 30773 | 1744 1744 1536 1504 1744 1536 1744 1632 |
matrix_mult | 2 | cint16 | cint16 | 8 | 4 | 512 | 1 | 1 | 1 | 32 | 2048 | 1 | 8 | 16932 ns | 939 MSa/s | 16 | 3 | 105144 | 1888 1152 1520 |
matrix_mult | 2 | cint16 | cint16 | 8 | 4 | 64 | 1 | 1 | 1 | 32 | 256 | 1 | 8 | 2228 ns | 901 MSa/s | 12 | 3 | 19128 | 1872 1136 1520 |
matrix_mult | 2 | cint16 | cint16 | 8 | 4 | 8 | 1 | 1 | 1 | 32 | 32 | 1 | 8 | 350 ns | 761 MSa/s | 9 | 2 | 5683 | 1440 992 |
matrix_mult | 2 | cint16 | cint16 | 8 | 64 | 8 | 1 | 1 | 1 | 512 | 512 | 1 | 8 | 2262 ns | 125 MSa/s | 7 | 2 | 21300 | 2128 1936 |
matrix_mult | 2 | cint16 | cint16 | 8 | 8 | 8 | 1 | 1 | 1 | 64 | 64 | 1 | 8 | 606 ns | 457 MSa/s | 7 | 2 | 6964 | 1792 1504 |
matrix_mult | 2 | cint16 | cint16 | 8 | 8 | 8 | 1 | 1 | 1 | 64 | 64 | 1 | 8 | 606 ns | 457 MSa/s | 7 | 2 | 6964 | 1792 1504 |
matrix_mult | 2 | cint16 | int16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 8 | 827 ns | 170 MSa/s | 24 | 6 | 22697 | 1648 1488 1632 1696 1632 1648 |
matrix_mult | 2 | int16 | int16 | 16 | 16 | 16 | 1 | 1 | 1 | 256 | 256 | 1 | 8 | 1330 ns | 876 MSa/s | 8 | 3 | 12697 | 1424 1504 1536 |
matrix_mult | 2 | int16 | int16 | 16 | 8 | 8 | 1 | 0 | 0 | 128 | 64 | 2 | 8 | 313 ns | 1802 MSa/s | 11 | 2 | 5651 | 1024 1424 |
matrix_mult | 2 | int16 | int32 | 8 | 64 | 4 | 0 | 1 | 1 | 512 | 256 | 4 | 8 | 481 ns | 320 MSa/s | 25 | 6 | 19753 | 1120 1360 1408 1504 1408 1360 |
matrix_mult | 2 | int16 | int32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 8 | 611 ns | 242 MSa/s | 31 | 8 | 26293 | 1456 1456 1408 1120 1456 1408 1456 1504 |
matrix_mult | 2 | int32 | int16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 8 | 742 ns | 166 MSa/s | 24 | 6 | 22697 | 1648 1136 1440 1520 1440 1648 |