Edit the
src/host/host.cpp
host file to change line 73. Change this line to declare the command queue as an out-of-order command queue.Code before the change:
mQueue = clCreateCommandQueue(Context, Device, CL_QUEUE_PROFILING_ENABLE, &mErr);
Code after the change:
mQueue = clCreateCommandQueue(Context, Device, CL_QUEUE_PROFILING_ENABLE | CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &mErr);
(Optional) Run the hardware emulation with the changed host code.
If you choose to run the Hardware Emulation feature, use the Timeline Trace to observe that using the out-of-order queue enables the kernels requested to be executed at almost the same time as one another (the blue bars represent kernel enqueue requests scheduled by the host).
However, though the host scheduled all these executions concurrently, second and third execution requests are delayed as there is only one CU on the FPGA (the FPGA still executes the kernels sequentially).
In the next step, increase the number of CU on the FPGA to allow three host kernel executions concurrently.