The host code main()
function includes OpenCL or Xilinx
Runtime (XRT) APIs to control the execution of PL kernels, as well as ADF APIs to
control the AI Engine graph (init()
, update()
, run()
, wait()
).
To load and control PL kernels from the host application, the execution model contains following steps:
- Get the OpenCL platform and device,
prepare a context and command queue. Program the XCLBIN file and get kernel objects
from the program.
adf::registerXRT()
is still needed, but the device handle can be converted from the XCL domain to XRT domain. - Prepare device buffers for the kernels. Transfer data from host memory to global memory in device.
- The host program sets up the kernel with its input parameters and triggers the execution of the kernel on the Versal™ device.
- Wait for kernel completion.
- Transfer data from global memory in the device back to host memory.
- Host code continues processing using the new data in the host memory.
Tip: Refer to
Developing Applications in
the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation
(UG1416) for more information on
coding the host application for controlling PL kernels.
The following is a code snippet from an example host.cpp to illustrate the prior steps:
//1. Get OpenCL platform and device, prepare context and command queue.
cl::Device device;
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
cl::Context context(device);
cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE | CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE);
//Program xclbin, and get kernel objects from the program.
cl::Program::Binaries bins;
cl::Program program(context, devices, bins);
cl::Kernel krnl_s2mm(program,"s2mm"); //get kernel object
// Create XRT device handle for ADF API
void *dh;
device.getInfo(CL_DEVICE_HANDLE, &dh);
auto dhdl = xrtDeviceOpenFromXcl(dh);
xuid_t uuid;
xrtDeviceGetXclbinUUID(dhdl, uuid);
adf::registerXRT(dhdl, uuid);
//2. Prepare device buffers for kernels. Transfer data from host memory to global memory in device.
std::complex<short> *host_out; //host buffer
cl::Buffer buffer_out(context, CL_MEM_WRITE_ONLY, output_size_in_bytes);
host_out=(std::complex<short>*)q.enqueueMapBuffer(buffer_out,true,CL_MAP_READ,0,sizeof(int)*OUTPUT_SIZE,nullptr,nullptr,nullptr);
//3. Set up kernel input parameters
krnl_s2mm.setArg(0,buffer_out);
krnl_s2mm.setArg(2,OUTPUT_SIZE);
//Launch the Kernel
q.enqueueTask(krnl_s2mm);
// ADF API: Initialize, run and update graph parameters (RTP)
gr.run(4);
gr.update(gr.trigger,10);
gr.update(gr.trigger,10);
gr.update(gr.trigger,100);
gr.update(gr.trigger,100);
gr.wait();
//4. Wait for kernel completion.
q.finish();//Wait for s2mm to complete
//5. Transfer data from global memory back to host memory.
q.enqueueMigrateMemObjects({buffer_out},CL_MIGRATE_MEM_OBJECT_HOST);
q.finish();//Wait for memory transfer to complete
//6. Continue processing on host memory
Important: The
q.finish()
function for the command queue is blocking. Before calling this
function, start dependent tasks such as graph execution.