Sometimes the compute intensive task required by the host application can be
broken into multiple, different kernels designed to perform different tasks on the FPGA
in parallel. By using multiple clEnqueueTask
commands
in an out-of-order command queue, for example, you can have multiple kernels performing
different tasks, running in parallel. This enables the task parallelism on the FPGA.