Controlling the PL Kernel with the XRT API - 2020.2 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2020-11-24
Version
2020.2 English

A Xilinx provided OpenSource XRT API for controlling execution of PL kernels outside of the graph when programming the host code for Linux.

The execution model for the XRT API controlling PL kernels can be as follows:

  1. Get device handle, load the XCLBIN. Get the uuid as needed.
  2. Allocate buffer objects and map to host memory. Process and transfer data from host memory to device memory.
  3. Get kernel and run handles, set arguments for kernels, and launch kernels.
  4. Wait for kernel completion.
  5. Transfer data from global memory in the device back to host memory.
  6. Host code continues processing using the new data in the host memory.

When using the native XRT API, the host application looks like the following:

1.// Open device, load xclbin, and get uuid
    
auto dhdl = xrtDeviceOpen(0);//device index=0

xrtDeviceLoadXclbinFile(dhdl,xclbinFilename);
xuid_t uuid;
xrtDeviceGetXclbinUUID(dhdl, uuid);

2. Allocate output buffer objects and map to host memory

xrtBufferHandle out_bohdl = xrtBOAlloc(dhdl, output_size_in_bytes, 0, /*BANK=*/0);
std::complex<short> *host_out = (std::complex<short>*)xrtBOMap(out_bohdl);

3. Get kernel and run handles, set arguments for kernel, and launch kernel.
xrtKernelHandle s2mm_khdl = xrtPLKernelOpen(dhdl, top->m_header.uuid, "s2mm"); // Open kernel handle
xrtRunHandle s2mm_rhdl = xrtRunOpen(s2mm_khdl); 
xrtRunSetArg(s2mm_rhdl, 0, out_bohdl); // set kernel arg
xrtRunSetArg(s2mm_rhdl, 2, OUTPUT_SIZE); // set kernel arg
xrtRunStart(s2mm_rhdl); //launch s2mm kernel

// ADF API:run, update graph parameters (RTP) and so on
……

4. Wait for kernel completion.
auto state = xrtRunWait(s2mm_rhdl);

5. Sync output device buffer objects to host memory.

xrtBOSync(out_bohdl, XCL_BO_SYNC_BO_TO_DEVICE , output_size_in_bytes,/*OFFSET=*/ 0);

//6. post-processing on host memory - "host_out"

After post-processing the data, release the allocated objects:

graph.end();
xrtRunClose(s2mm_rhdl);
xrtKernelClose(s2mm_khdl);

xrtBOFree(out_bohdl);
xrtDeviceClose(dhdl);
Tip: The graph.end() function is required before calling xrtDeviceClose() to release objects that are allocated implicitly for PL kernels inside the graph. After graph.end(), the AI Engine kernels will not recover again. To run the host multiple times, you can comment out graph.end() if the host does not depend on graph.end() for synchronization purpose, or replace graph.end() with graph.wait() to do synchronization.