The source code for the host program is written in C/C++ and uses the native XRT APIs to interact with the hardware-accelerated vector-add kernel.
Open the
host.cpp
file located in thesrc
directory of this tutorial
There are four main steps in the source code for this simple example.
Step 1: The runtime environment is initialized. In this section, the host detects the attached AMD device, loads the FPGA binary (.xclbin file) from file and programs it into the first AMD device it found. Then the kernel object is created. All Vitis applications will have code very similar to the one in this section.
Step 2: The application creates the three buffers needed to share data with the kernel: one for each input and one for the output. On data-center platforms.
std::cout << "Allocate Buffer in Global Memory\n";
auto boIn1 = xrt::bo(device, vector_size_bytes, krnl.group_id(0)); //Match kernel arguments to RTL kernel
auto boIn2 = xrt::bo(device, vector_size_bytes, krnl.group_id(1));
auto boOut = xrt::bo(device, vector_size_bytes, krnl.group_id(2));
// Map the contents of the buffer object into host memory
auto bo0_map = boIn1.map<int*>();
auto bo1_map = boIn2.map<int*>();
auto bo2_map = boOut.map<int*>();
std::fill(bo0_map, bo0_map + DATA_SIZE, 0);
std::fill(bo1_map, bo1_map + DATA_SIZE, 0);
std::fill(bo2_map, bo2_map + DATA_SIZE, 0);
NOTE: A common alternative is for the application to explicitly allocate host memory, and reuse the corresponding pointers when creating the buffers. The approach used in this example was chosen because it is the most portable and efficient across both data center and embedded platforms.
Step 3: The host program sets the arguments of the kernel, then schedules three operations: the transfers of the two input vectors to device memory, the execution of the kernel, and lastly the transfer of the results back to host memory.
// Synchronize buffer content with device side std::cout << "synchronize input buffer data to device global memory\n"; boIn1.sync(XCL_BO_SYNC_BO_TO_DEVICE); boIn2.sync(XCL_BO_SYNC_BO_TO_DEVICE); std::cout << "Execution of the kernel\n"; auto run = krnl(boIn1, boIn2, boOut, DATA_SIZE); //DATA_SIZE=size run.wait(); // Get the output; std::cout << "Get the output data from the device" << std::endl; boOut.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
Step 4: The
run.wait()
returns when the kernel has completed. At that time, the output buffer containing the results of the kernel are migrated back to host memory and can safely be used by the software application. Here the results are simply checked against expected values before the program finishes.
This example shows the simplest way of using XRT API to interact with the hardware accelerator.