AOCL-DLP supports multiple threading models that can be configured at build time and controlled at runtime.
Threading Model Configuration:
AOCL-DLP supports two threading models that can be configured at build time:
none (default): Single-threaded execution, no parallel processing
openmp: OpenMP-based threading for parallel GEMM operations
# Configure with OpenMP threading (recommended for performance)
$ cmake -B build -DDLP_THREADING_MODEL=openmp
# Configure single-threaded (default)
$ cmake -B build -DDLP_THREADING_MODEL=none
Threading Model Details:
Setting
DLP_THREADING_MODEL=openmpautomatically enables OpenMP supportFor custom OpenMP installations, use
-DDLP_OPENMP_ROOT=/path/to/openmpThe
DLP_ENABLE_OPENMPoption provides additional control over OpenMP
Thread Control Precedence:
AOCL-DLP follows a specific order of precedence when determining the number of threads to use for GEMM operations:
API calls -
dlp_thread_set_num_threads()ordlp_thread_set_ways()DLP_NUM_THREADS - Library-specific environment variable
OpenMP API -
omp_set_num_threads()(when OpenMP is enabled)OMP_NUM_THREADS - OpenMP environment variable
System default - Number of available CPU cores
Performance Recommendations:
For Multi-Socket Systems:
Use
OMP_PROC_BIND=closewithnumactl --cpunodebindto bind to specific NUMA nodesSet
OMP_PLACES=coresfor fine-grained thread controlUse
numactl --interleavefor memory interleaving across NUMA nodesSet
OMP_WAIT_POLICY=activefor optimal performanceConsider binding to the second socket for better performance (machine dependent)
For Single-Socket Systems:
Use
OMP_PROC_BIND=closeto keep threads near each otherSet
OMP_PLACES=coresfor better cache localitySet
OMP_WAIT_POLICY=activefor reduced thread wake-up overhead
For Memory-Bound Workloads:
Consider using fewer threads than physical cores
Set
OMP_WAIT_POLICY=activefor consistent thread responsivenessEnsure proper memory alignment and layout
For Compute-Bound Workloads:
Use all available cores
Set
OMP_WAIT_POLICY=activeto reduce thread wake-up overheadConsider hyperthreading benefits for your specific workload