Following table gives results for the Matrix Multiply function with a wide variety of supported parameters, which are defined in: L2 Matrix Multiply Configuration Parameters.
Library Element | AIE_VARIANT | T_DATA_A | T_DATA_B | P_DIM_A | P_DIM_AB | P_DIM_B | P_ADD_TILING_A | P_ADD_TILING_B | P_ADD_DETILING_OUT | P_INPUT_WINDOW_VSIZE_A | P_INPUT_WINDOW_VSIZE_B | P_CASC_LEN | NITER | Latency | Throughput | NUM_BANKS | NUM_AIE | DATA_MEMORY | PROGRAM_MEMORY |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
matrix_mult | 1 | cfloat | cfloat | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 5017 ns | 43 MSa/s | 29 | 9 | 48308 | 2930 2908 2930 3164 2930 3176 3164 2930 3262 |
matrix_mult | 1 | cint16 | cint16 | 8 | 8 | 8 | 1 | 1 | 1 | 64 | 64 | 1 | 12 | 8518 ns | 30 MSa/s | 12 | 3 | 9620 | 3932 2270 3260 |
matrix_mult | 1 | cint16 | cint32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 1218 ns | 140 MSa/s | 30 | 9 | 37304 | 1638 1646 1638 1668 1638 1668 1638 1206 1798 |
matrix_mult | 1 | cint16 | int16 | 8 | 64 | 4 | 1 | 0 | 1 | 512 | 256 | 4 | 12 | 6719 ns | 19 MSa/s | 30 | 7 | 25132 | 3012 1652 3618 1588 1588 3618 1524 |
matrix_mult | 1 | cint16 | int32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 8627 ns | 15 MSa/s | 29 | 9 | 33208 | 3852 2648 3852 2760 3852 3176 2760 3836 2860 |
matrix_mult | 1 | cint32 | cint16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 890 ns | 141 MSa/s | 32 | 9 | 41400 | 1798 1596 1798 1658 1798 1658 1798 1222 1770 |
matrix_mult | 1 | cint32 | cint32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 2095 ns | 87 MSa/s | 27 | 9 | 46002 | 1782 1910 1782 2008 1782 2008 1782 1222 2112 |
matrix_mult | 1 | cint32 | int16 | 8 | 64 | 4 | 1 | 0 | 1 | 512 | 256 | 4 | 12 | 727 ns | 141 MSa/s | 26 | 7 | 34092 | 1222 1762 2150 1666 1650 2150 1644 |
matrix_mult | 1 | cint32 | int32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 890 ns | 141 MSa/s | 32 | 9 | 41400 | 1798 1596 1798 1658 1798 1658 1798 1222 1770 |
matrix_mult | 1 | float | cfloat | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 2200 ns | 83 MSa/s | 28 | 9 | 39607 | 2342 2356 2342 2362 2342 2362 2342 1206 2458 |
matrix_mult | 1 | float | float | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 8652 ns | 15 MSa/s | 34 | 9 | 35121 | 3852 2032 3852 2048 3852 2048 3852 3012 2174 |
matrix_mult | 1 | int16 | cint16 | 16 | 16 | 16 | 1 | 1 | 1 | 256 | 256 | 1 | 12 | 6472 ns | 257 MSa/s | 13 | 3 | 16787 | 2084 2228 3554 |
matrix_mult | 1 | int16 | cint16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 8188 ns | 34 MSa/s | 26 | 9 | 28471 | 1732 1366 1732 1518 1732 1518 1732 3012 1602 |
matrix_mult | 1 | int16 | cint32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 1053 ns | 189 MSa/s | 28 | 9 | 33464 | 1988 1700 1988 1722 1972 1206 1722 1988 1848 |
matrix_mult | 1 | int16 | int16 | 16 | 16 | 16 | 1 | 1 | 1 | 256 | 256 | 1 | 12 | 7814 ns | 135 MSa/s | 13 | 3 | 12689 | 3762 1840 3284 |
matrix_mult | 1 | int16 | int16 | 16 | 8 | 8 | 1 | 0 | 0 | 128 | 64 | 2 | 12 | 372 ns | 1438 MSa/s | 11 | 2 | 5643 | 1334 1678 |
matrix_mult | 1 | int16 | int16 | 8 | 64 | 4 | 1 | 0 | 1 | 512 | 256 | 4 | 12 | 6400 ns | 17 MSa/s | 27 | 6 | 18529 | 1366 3778 1366 1366 3794 1566 |
matrix_mult | 1 | int16 | int32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 8188 ns | 34 MSa/s | 26 | 9 | 28471 | 1732 1366 1732 1518 1732 1518 1732 3012 1602 |
matrix_mult | 1 | int32 | cint16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 8627 ns | 15 MSa/s | 29 | 9 | 33208 | 3852 2648 3852 2760 3852 3176 2760 3836 2860 |
matrix_mult | 1 | int32 | cint32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 1218 ns | 140 MSa/s | 30 | 9 | 37304 | 1638 1646 1638 1668 1638 1668 1638 1206 1798 |
matrix_mult | 1 | cint16 | cint16 | 8 | 8 | 8 | 1 | 1 | 1 | 64 | 64 | 1 | 12 | 8518 ns | 30 MSa/s | 12 | 3 | 9620 | 3932 2270 3260 |
matrix_mult | 1 | int32 | int16 | 8 | 64 | 4 | 1 | 0 | 1 | 512 | 256 | 4 | 12 | 6719 ns | 19 MSa/s | 30 | 7 | 25132 | 3012 1652 3618 1588 1588 3618 1524 |
matrix_mult | 1 | cint16 | cint16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 8496 ns | 15 MSa/s | 33 | 9 | 32824 | 3852 1810 3852 1910 3836 1910 3852 3012 1952 |
matrix_mult | 1 | cint16 | cint16 | 8 | 4 | 64 | 1 | 1 | 1 | 32 | 256 | 1 | 12 | 5492 ns | 462 MSa/s | 13 | 3 | 19091 | 3142 1670 3292 |
matrix_mult | 1 | cfloat | float | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 1666 ns | 106 MSa/s | 28 | 9 | 43703 | 1894 2176 1894 2182 1894 2182 1894 1222 2296 |
matrix_mult | 1 | cint16 | cint16 | 1024 | 4 | 4 | 0 | 0 | 0 | 4096 | 16 | 1 | 12 | 6204 ns | 1000 MSa/s | 11 | 1 | 67849 | 1688 |
matrix_mult | 1 | cint16 | cint16 | 1024 | 4 | 4 | 1 | 1 | 1 | 4096 | 16 | 1 | 12 | 10006 ns | 1000 MSa/s | 17 | 3 | 105107 | 2950 3160 1688 |
matrix_mult | 1 | cint16 | cint16 | 16 | 16 | 16 | 0 | 0 | 0 | 256 | 256 | 1 | 12 | 1596 ns | 428 MSa/s | 7 | 1 | 8329 | 3136 |
matrix_mult | 1 | cint16 | cint16 | 16 | 16 | 16 | 1 | 1 | 1 | 256 | 256 | 1 | 12 | 9020 ns | 118 MSa/s | 13 | 3 | 18836 | 3554 3136 4348 |
matrix_mult | 1 | cint16 | cint16 | 16 | 256 | 16 | 0 | 0 | 1 | 4096 | 4096 | 1 | 12 | 18791 ns | 35 MSa/s | 14 | 2 | 73998 | 3554 3152 |
matrix_mult | 1 | cint16 | cint16 | 20 | 60 | 4 | 1 | 1 | 1 | 1200 | 240 | 1 | 12 | 8781 ns | 33 MSa/s | 13 | 3 | 30868 | 4012 1936 3160 |
matrix_mult | 1 | cint16 | cint16 | 24 | 4 | 4 | 1 | 1 | 1 | 96 | 16 | 1 | 12 | 4089 ns | 91 MSa/s | 12 | 3 | 9107 | 2950 1688 3160 |
matrix_mult | 1 | cint16 | cint16 | 32 | 32 | 32 | 0 | 0 | 0 | 1024 | 1024 | 1 | 12 | 10486 ns | 268 MSa/s | 7 | 1 | 26761 | 4824 |
matrix_mult | 1 | cint16 | cint16 | 32 | 32 | 32 | 1 | 0 | 0 | 1024 | 1024 | 1 | 12 | 17135 ns | 267 MSa/s | 8 | 2 | 37134 | 3314 4824 |
matrix_mult | 1 | cint16 | cint16 | 32 | 32 | 64 | 0 | 0 | 0 | 1024 | 2048 | 1 | 12 | 21593 ns | 272 MSa/s | 7 | 1 | 43145 | 4824 |
matrix_mult | 1 | cint16 | cint16 | 32 | 64 | 32 | 0 | 0 | 0 | 2048 | 2048 | 1 | 12 | 20154 ns | 138 MSa/s | 7 | 1 | 43145 | 4824 |
matrix_mult | 1 | cint16 | cint16 | 8 | 4 | 4 | 1 | 1 | 1 | 32 | 16 | 1 | 12 | 5111 ns | 30 MSa/s | 12 | 3 | 7571 | 2950 1466 3012 |
matrix_mult | 1 | cint16 | cint16 | 8 | 4 | 512 | 1 | 0 | 1 | 32 | 2048 | 1 | 12 | 25021 ns | 999 MSa/s | 13 | 2 | 86542 | 1686 3324 |
matrix_mult | 1 | cint16 | cint16 | 8 | 4 | 512 | 1 | 1 | 1 | 32 | 2048 | 1 | 12 | 25025 ns | 999 MSa/s | 17 | 3 | 105107 | 3142 1686 3324 |
matrix_mult | 1 | cint16 | cint16 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 1 | 12 | 8523 ns | 14 MSa/s | 12 | 3 | 19348 | 3852 1866 2996 |
matrix_mult | 1 | int32 | int32 | 8 | 64 | 4 | 1 | 1 | 1 | 512 | 256 | 4 | 12 | 8496 ns | 15 MSa/s | 33 | 9 | 32824 | 3852 1810 3852 1910 3836 1910 3852 3012 1952 |