The xclbin can be built in 250 MHz. The hardware resource utilization and benchmark results are shown in the following two tables.
Table 1 Hardware Resources
Name | LUT | BRAM | URAM | DSP | REG |
---|---|---|---|---|---|
gemmAddsKernel | 101988 | 0 | 0 | 384 | 192516 |
gemmCPlusXKernel | 8529 | 24 | 0 | 66 | 20358 |
gemmLoadStoreKernel | 7126 | 23 | 0 | 16 | 19457 |
gemmMergeKernel | 8342 | 0 | 0 | 0 | 25219 |
gemmMulsKernel | 50640 | 0 | 0 | 768 | 98013 |
gemmSystolicArrayKernel | 2541 | 0 | 0 | 0 | 240 |
gemmTagsKernel | 20203 | 15 | 0 | 8 | 34678 |
gemmTimerKernel | 32 | 0 | 0 | 0 | 115 |
Table 2 Benchmark Results
M | N | K | api execution time [ms] | api Eff [%] | PerfApiTops |
---|---|---|---|---|---|
256 | 256 | 256 | 1.370527 | 19.127241 | 0.024626 |
512 | 512 | 512 | 4.517989 | 46.417820 | 0.059589 |
1024 | 1024 | 1024 | 29.500145 | 56.871639 | 0.072902 |
2048 | 2048 | 2048 | 217.555482 | 61.693563 | 0.079026 |
4096 | 4096 | 4096 | 1685.337895 | 63.710774 | 0.081580 |