In this tutorial, you operate on a simple, single, generic C++ kernel implementation. This allows you to eliminate any aspects of the kernel code modifications, topological optimizations, and implementation choices from the analysis of host code implementations.
The following sections focus on the following specific host code optimization concerns:
Software Pipelining/Event Queue
Kernel and Host Code Synchronization
Buffer Size