Before examining different implementation options for the host code, view the structure of the code. The host code file is designed to let you focus on the key aspects of host code optimization.
The following three classes are provided through header files in the common source directory (srcCommon
):
srcCommon/AlignedAllocator.h
:AlignedAllocator
is a small struct with two methods. This struct is provided as a helper class to support memory-aligned allocation for the test vectors. On Alveo Data Center accelerator cards, memory-aligned blocks of data can be transferred much more rapidly, and the OpenCL™ API library will create warnings if the data transmitted is not memory-aligned.srcCommon/ApiHandle.h
: This class encapsulates the main OpenCL API objects:context
program
device_id
execution kernel
command_queue
These structures are populated by the constructor, which steps through the default sequence of OpenCL API function calls. There are only two configuration parameters to the constructor:
A string containing the name of the bitstream (
xclbin
) to be used to program the FPGA.A Boolean to determine if an out-of-order queue or a sequential execution queue should be created.
The class provides accessory functions to the queue, context, and kernel required for the generation of buffers and the scheduling of tasks on the accelerator. The class also automatically releases the allocated OpenCL API objects when the ApiHandle destructor is called.
srcCommon/Task.h
: An object of classTask
represents a single instance of the workload to be executed on the accelerator. Whenever an object of this class is constructed, the input and output vectors are allocated and initialized based on the buffer size to be transferred per task invocation. Similarly, the destructor will de-allocate any object generated during the task execution.NOTE: This encapsulation of a single workload for the invocation of a module allows this class to also contain an output validator function (
outputOk
).The constructor for this class contains two parameters:
bufferSize
: Determines how many 512-bit values are transferred when this task is executed.processDelay
: Provides the similarly-named kernel parameter, and it is also used during validation.
The most important member function of this class is the
run
function. This function enqueues three different steps for executing the algorithm:Writing data to the FPGA accelerator
Setting up the kernel and running the accelerator
Reading the data back from the FPGA accelerator
To perform these operations, buffers are allocated on the DDR for the communication. Additionally, events are used to establish a dependency between the different commands (write before execute before read).
In addition to the ApiHandle object, the
run
function has one conditional argument. This argument allows a task to be dependent on a previously-generated event. This allows the host code to establish task order dependencies, as illustrated later in this tutorial.None of the code in any of these header files is modified during this tutorial. All key concepts will be shown in different
host.cpp
files, as found in:src/pipeline_host.cpp
src/sync_host.cpp
src/buf_host.cpp
However, the main function in the host.cpp
file follows a specific structure described in the following section.