Starting from AOCL-BLAS 3.1, if the number of threads is not specified, a multi-threaded build of AOCL-BLAS will generally use one thread per logical core on the system. A higher number of threads results in better performance for medium to large size matrices found in practical use cases.
However, the higher number of threads results in poor performance for very small sizes used by the test and check features. Hence, you must specify the number of threads while running the test/test suite.
The recommended number of threads to run the test suite is 1 or 2.
Running Test Suite
Execute the following command to invoke the test suite (using configure+make on Linux as an example):
$ OMP_NUM_THREADS=2 make test
The sample output from the execution of the command is as follows:
$ OMP_NUM_THREADS=2 make test
Compiling obj/zen3/testsuite/test_addm.o
Compiling obj/zen3/testsuite/test_addv.o
<<< More compilation output >>>
Compiling obj/zen3/testsuite/test_xpbym.o
Compiling obj/zen3/testsuite/test_xpbyv.o
Linking test_libblis-mt.x against 'lib/zen3/libblis-mt.a -lm
-lpthread -fopenmp -lrt' Running test_libblis-mt.x with output
redirected to 'output.testsuite'
check-blistest.sh: All BLIS tests passed! Compiling
obj/zen3/blastest/cblat1.o Compiling obj/zen3/blastest/abs.o
<<< More compilation output >>>
Compiling obj/zen3/blastest/wsfe.o
Compiling obj/zen3/blastest/wsle.o
Archiving obj/zen3/blastest/libf2c.a
Linking cblat1.x against 'libf2c.a lib/zen3/libblis-mt.a -lm
-lpthread -fopenmp -lrt' Running cblat1.x > 'out.cblat1'
<<< More compilation output >>>
Linking zblat3.x against 'libf2c.a lib/zen3/libblis-mt.a -lm
-lpthread -fopenmp -lrt' Running zblat3.x <
'./blastest/input/zblat3.in' (output to 'out.zblat3')
check-blastest.sh: All BLAS tests passed!