Optimize parallel execution: // Set thread count (typically number of CPU cores) dlp_thread_set_num_threads(8); // Configure workload distribution dlp_thread_set_ways(2, 2, 2); // 3D parallelization