AOCL-DLP automatically selects the best kernel for your CPU based on available instruction sets. However, you can override this behavior for testing or specific optimization scenarios.
Architecture Control:
The AOCL_ENABLE_INSTRUCTIONS environment variable forces specific instruction sets, overriding auto-detection:
# Force AVX512 instructions on Zen4 processors
export AOCL_ENABLE_INSTRUCTIONS=avx512
./your_application
# Use Zen3-optimized kernels
export AOCL_ENABLE_INSTRUCTIONS=zen3
./your_application
Supported Values:
zen5: Zen 5 architecture optimizationszen4: Zen 4 architecture optimizationszen3: Zen 3 architecture optimizationszen2: Zen 2 architecture optimizationsavx512: AVX-512 instruction setavx2: AVX2 instruction set
Optimization Strategies:
Choose Appropriate Data Types: Use lower precision (bf16, int8) when accuracy permits
Enable Matrix Reordering: Reorder frequently used matrices for better cache performance
Utilize Post-Operations: Fuse operations to reduce memory bandwidth
Minimize Operations: Use matrix reordering beforehand so that DLP has to do fewer operations
Align Memory: Ensure proper memory alignment for vector instructions