Use Shared Library for Development: Faster iteration during development
Test with Static Library Before Deployment: Verify performance matches shared library
Set Threading Explicitly: Don’t rely on defaults
dlp_thread_set_num_threads(8);
Reuse Reordered Matrices: For repeated GEMM with the same weights
Check CPU Features at Runtime: AOCL-DLP automatically selects best kernel for your CPU
Use Post-Operations for Fused Kernels: Leverage fused operations for better performance
For comprehensive integration documentation including troubleshooting and advanced topics, see the Integration Guide.