Matrix Multiply - 2023.1 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.1 English

Following table gives results for the Matrix Multiply function with a wide variety of supported parameters, which are defined in: L2 Matrix Multiply Configuration Parameters.

matrix_mult_benchmark.csv

Table 70 Matrix Multiply benchmark
Library Element AIE_VARIANT T_DATA_A T_DATA_B P_DIM_A P_DIM_AB P_DIM_B P_ADD_TILING_A P_ADD_TILING_B P_ADD_DETILING_OUT P_INPUT_WINDOW_VSIZE_A P_INPUT_WINDOW_VSIZE_B P_CASC_LEN NITER Latency Throughput NUM_BANKS NUM_AIE DATA_MEMORY PROGRAM_MEMORY
matrix_mult 1 cfloat cfloat 8 64 4 1 1 1 512 256 4 12 5017 ns 43 MSa/s 29 9 48308 2930 2908 2930 3164 2930 3176 3164 2930 3262
matrix_mult 1 cint16 cint16 8 8 8 1 1 1 64 64 1 12 8518 ns 30 MSa/s 12 3 9620 3932 2270 3260
matrix_mult 1 cint16 cint32 8 64 4 1 1 1 512 256 4 12 1218 ns 140 MSa/s 30 9 37304 1638 1646 1638 1668 1638 1668 1638 1206 1798
matrix_mult 1 cint16 int16 8 64 4 1 0 1 512 256 4 12 6719 ns 19 MSa/s 30 7 25132 3012 1652 3618 1588 1588 3618 1524
matrix_mult 1 cint16 int32 8 64 4 1 1 1 512 256 4 12 8627 ns 15 MSa/s 29 9 33208 3852 2648 3852 2760 3852 3176 2760 3836 2860
matrix_mult 1 cint32 cint16 8 64 4 1 1 1 512 256 4 12 890 ns 141 MSa/s 32 9 41400 1798 1596 1798 1658 1798 1658 1798 1222 1770
matrix_mult 1 cint32 cint32 8 64 4 1 1 1 512 256 4 12 2095 ns 87 MSa/s 27 9 46002 1782 1910 1782 2008 1782 2008 1782 1222 2112
matrix_mult 1 cint32 int16 8 64 4 1 0 1 512 256 4 12 727 ns 141 MSa/s 26 7 34092 1222 1762 2150 1666 1650 2150 1644
matrix_mult 1 cint32 int32 8 64 4 1 1 1 512 256 4 12 890 ns 141 MSa/s 32 9 41400 1798 1596 1798 1658 1798 1658 1798 1222 1770
matrix_mult 1 float cfloat 8 64 4 1 1 1 512 256 4 12 2200 ns 83 MSa/s 28 9 39607 2342 2356 2342 2362 2342 2362 2342 1206 2458
matrix_mult 1 float float 8 64 4 1 1 1 512 256 4 12 8652 ns 15 MSa/s 34 9 35121 3852 2032 3852 2048 3852 2048 3852 3012 2174
matrix_mult 1 int16 cint16 16 16 16 1 1 1 256 256 1 12 6472 ns 257 MSa/s 13 3 16787 2084 2228 3554
matrix_mult 1 int16 cint16 8 64 4 1 1 1 512 256 4 12 8188 ns 34 MSa/s 26 9 28471 1732 1366 1732 1518 1732 1518 1732 3012 1602
matrix_mult 1 int16 cint32 8 64 4 1 1 1 512 256 4 12 1053 ns 189 MSa/s 28 9 33464 1988 1700 1988 1722 1972 1206 1722 1988 1848
matrix_mult 1 int16 int16 16 16 16 1 1 1 256 256 1 12 7814 ns 135 MSa/s 13 3 12689 3762 1840 3284
matrix_mult 1 int16 int16 16 8 8 1 0 0 128 64 2 12 372 ns 1438 MSa/s 11 2 5643 1334 1678
matrix_mult 1 int16 int16 8 64 4 1 0 1 512 256 4 12 6400 ns 17 MSa/s 27 6 18529 1366 3778 1366 1366 3794 1566
matrix_mult 1 int16 int32 8 64 4 1 1 1 512 256 4 12 8188 ns 34 MSa/s 26 9 28471 1732 1366 1732 1518 1732 1518 1732 3012 1602
matrix_mult 1 int32 cint16 8 64 4 1 1 1 512 256 4 12 8627 ns 15 MSa/s 29 9 33208 3852 2648 3852 2760 3852 3176 2760 3836 2860
matrix_mult 1 int32 cint32 8 64 4 1 1 1 512 256 4 12 1218 ns 140 MSa/s 30 9 37304 1638 1646 1638 1668 1638 1668 1638 1206 1798
matrix_mult 1 cint16 cint16 8 8 8 1 1 1 64 64 1 12 8518 ns 30 MSa/s 12 3 9620 3932 2270 3260
matrix_mult 1 int32 int16 8 64 4 1 0 1 512 256 4 12 6719 ns 19 MSa/s 30 7 25132 3012 1652 3618 1588 1588 3618 1524
matrix_mult 1 cint16 cint16 8 64 4 1 1 1 512 256 4 12 8496 ns 15 MSa/s 33 9 32824 3852 1810 3852 1910 3836 1910 3852 3012 1952
matrix_mult 1 cint16 cint16 8 4 64 1 1 1 32 256 1 12 5492 ns 462 MSa/s 13 3 19091 3142 1670 3292
matrix_mult 1 cfloat float 8 64 4 1 1 1 512 256 4 12 1666 ns 106 MSa/s 28 9 43703 1894 2176 1894 2182 1894 2182 1894 1222 2296
matrix_mult 1 cint16 cint16 1024 4 4 0 0 0 4096 16 1 12 6204 ns 1000 MSa/s 11 1 67849 1688
matrix_mult 1 cint16 cint16 1024 4 4 1 1 1 4096 16 1 12 10006 ns 1000 MSa/s 17 3 105107 2950 3160 1688
matrix_mult 1 cint16 cint16 16 16 16 0 0 0 256 256 1 12 1596 ns 428 MSa/s 7 1 8329 3136
matrix_mult 1 cint16 cint16 16 16 16 1 1 1 256 256 1 12 9020 ns 118 MSa/s 13 3 18836 3554 3136 4348
matrix_mult 1 cint16 cint16 16 256 16 0 0 1 4096 4096 1 12 18791 ns 35 MSa/s 14 2 73998 3554 3152
matrix_mult 1 cint16 cint16 20 60 4 1 1 1 1200 240 1 12 8781 ns 33 MSa/s 13 3 30868 4012 1936 3160
matrix_mult 1 cint16 cint16 24 4 4 1 1 1 96 16 1 12 4089 ns 91 MSa/s 12 3 9107 2950 1688 3160
matrix_mult 1 cint16 cint16 32 32 32 0 0 0 1024 1024 1 12 10486 ns 268 MSa/s 7 1 26761 4824
matrix_mult 1 cint16 cint16 32 32 32 1 0 0 1024 1024 1 12 17135 ns 267 MSa/s 8 2 37134 3314 4824
matrix_mult 1 cint16 cint16 32 32 64 0 0 0 1024 2048 1 12 21593 ns 272 MSa/s 7 1 43145 4824
matrix_mult 1 cint16 cint16 32 64 32 0 0 0 2048 2048 1 12 20154 ns 138 MSa/s 7 1 43145 4824
matrix_mult 1 cint16 cint16 8 4 4 1 1 1 32 16 1 12 5111 ns 30 MSa/s 12 3 7571 2950 1466 3012
matrix_mult 1 cint16 cint16 8 4 512 1 0 1 32 2048 1 12 25021 ns 999 MSa/s 13 2 86542 1686 3324
matrix_mult 1 cint16 cint16 8 4 512 1 1 1 32 2048 1 12 25025 ns 999 MSa/s 17 3 105107 3142 1686 3324
matrix_mult 1 cint16 cint16 8 64 4 1 1 1 512 256 1 12 8523 ns 14 MSa/s 12 3 19348 3852 1866 2996
matrix_mult 1 int32 int32 8 64 4 1 1 1 512 256 4 12 8496 ns 15 MSa/s 33 9 32824 3852 1810 3852 1910 3836 1910 3852 3012 1952