Beginning with AOCL 5.2, additional build options were added to disable optimized code paths for different matrix sizes in GEMM and TRSM. These may be useful for:
Testing and benchmarking
To reduce numerical rounding differences when repeating calculations with different parallelism used at a higher level in the application. This could result in different code paths being selected in AOCL-BLAS due to the different sub-problem sizes passed to BLAS APIs.
Build characteristic |
CMake options
————————————
configure options
|
Usage |
|---|---|---|
M, N or K=1 (GEMM and TRSM) |
-DENABLE_MNK1_MATRIX=OFF
-DENABLE_MNK1_MATRIX=ON
————————————
--disable-mnk1-matrix
--enable-mnk1-matrix
|
Disable calls to GEMV or other optimizations for these special cases.
|
Tiny matrix (GEMM) |
-DENABLE_TINY_MATRIX=OFF
-DENABLE_TINY_MATRIX=ON
————————————
--disable-tiny-matrix
--enable-tiny-matrix
|
Code path for tiny matrices, minimizing all framework overheads.
|
Small matrix (GEMM) |
-DENABLE_SMALL_MATRIX=OFF
-DENABLE_SMALL_MATRIX=ON
————————————
--disable-small-matrix
--enable-small-matrix
|
Code path for problems smaller than optimal for SUP
|
SUP (GEMM) |
-DENABLE_SUP_HANDLING=OFF
-DENABLE_SUP_HANDLING=ON
————————————
--disable-sup-handling
--enable-sup-handling
|
Small/Skinny UnPacked code path
|
Small matrix (TRSM) |
-DENABLE_SMALL_MATRIX_TRSM=OFF
-DENABLE_SMALL_MATRIX_TRSM=ON
————————————
--disable-small-matrix-trsm
--enable-small-matrix-trsm
|
Alternative code path to native for TRSM
|
Note that the problem sizes that qualify as tiny, small or SUP depends on thresholds that vary based on the BLIS sub-configuration.