AOCL-BLAS provides a selective packing for GEMM when one or two-dimensions of a matrix is exceedingly small. Selective packing is only applicable when sup is enabled. For optimal performance:
# C = beta*C + alpha*A*B
# Dimension (Dim) of A - m x k
# Dimension (Dim) of B - k x n
# Dimension (Dim) of C - m x n
# Assume all are stored in row-major format.
# IF m >> n
$ BLIS_PACK_A=1 ./test_gemm_blis.x - will give a better performance.
# IF m << n
$ BLIS_PACK_B=1 ./test_gemm_blis.x - will give a better performance.