XRT provides C and C++ APIs to control PL kernels and AI Engine graphs.
The execution model for the XRT API controlling PL kernels and AI Engine graphs is as follows:
- Open the device, and load XCLBIN. Get the UUID as needed.
- Allocate buffer objects and map-to-host memory. Process and transfer data from the host memory to the device memory.
- Get PL kernel handles, set arguments for kernels, and launch kernels.
- Get AI Engine graphs, and run graphs.
- Wait for the completion of the graphs.
- Wait for the completion of the kernels.
- Transfer data from the global memory in the device back to the host memory.
- The host code continues processing using the new data in the host memory.
Note: There are two ways to start the
AI Engine graph. The AI Engine graph can be auto-started when
the board is booted and runs forever after download. The package setting
XRT provides class defer_aie_run
determines this behavior. PL kernels,
and AI Engine graphs can also be started in
the host application and the application code can determine whether to wait for a
specific kernel's completion or a specific graph's completion. This behavior can
vary depending on the design.graph
in the name space xrt
and its
member functions to control the AI Engine
graph. Example code to control the AI Engine graph and PL kernels using the XRT C++ API is as follows:
// Including the xrt header files below is mandatory
#include "xrt/xrt_graph.h"
#include "xrt/xrt_kernel.h"
size_t output_size_in_bytes = OUTPUT_SIZE * sizeof(int);
// Open xclbin
auto device = xrt::device(0); //device index=0
//load the xclbin application which may contain PL kernels and AI Engine graphs
auto uuid = device.load_xclbin(xclbinFilename);
// PL control
// Get the handle to s2mm & random_noise PL kernel
auto s2mm = xrt::kernel(device, uuid, "s2mm");
auto random_noise = xrt::kernel(device, uuid, "random_noise");
// allocate output memory for data from s2mm kernel
auto out_bo = xrt::bo(device, output_size_in_bytes,s2mm.group_id(0));
auto host_out=out_bo.map<std::complex<short>*>();
//run the s2mm and random_noise PL kernels
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);//start s2mm
auto random_noise_run = random_noise(nullptr, OUTPUT_SIZE);
//AI Engine Graph Control
//Initialize run time parameter data
int coeffs_readback[12];
int narrow_filter[12] = {180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504};
int wide_filter[12] = {-21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539};
//get the handle to the graph called "gr"
auto ghdl=xrt::graph(device,uuid,"gr");
// update run time parameter in the graph
ghdl.update("gr.fir24.in[1]",narrow_filter);
//run the graph for 16 iterations
ghdl.run(16);
// wait for graph to complete running 16 iterations
ghdl.wait();
//read value from a run time parameter
ghdl.read("gr.fir24.inout[0]",coeffs_readback);//Read after graph::wait. RTP update effective
// update run time parameter in the graph
ghdl.update("gr.fir24.in[1]",wide_filter);
//run the graph for 16 iterations
ghdl.run(16);
ghdl.read("gr.fir24.inout[0]", coeffs_readback);//Async read
ghdl.wait();
// wait for the s2mm PL kernel to be done
auto state = s2mm_run.wait();
std::cout << "s2mm completed with status(" << state << ")\n";
out_bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
//Post-processing...
...