For processing multiple GEMM operations efficiently: // Batch processing aocl_batch_gemm_f32f32f32of32(...) aocl_batch_gemm_bf16bf16f32of32(...)