User-managed kernels require the use of the XRT native API for the software
application, and are specified as an IP object of the xrt::ip
class. The following is a high-level overview of how to structure
your host application to access user-managed kernels from an .xclbin file.
- Add the following header files to include the XRT native API:
#include "experimental/xrt_ip.h" #include "xrt/xrt_bo.h"
-
experimental/xrt_ip.h:
Defines the IP as an object of
xrt::ip
. - xrt/xrt_bo.h: Lets you create buffer objects in the XRT native API.
-
experimental/xrt_ip.h:
Defines the IP as an object of
- Set up the application environment as described in Specifying the Device ID and Loading the XCLBIN.
- The IP object (
xrt::ip
) is constructed from thexrt::device
object, theuuid
of the loaded .xclbin, and thename
of the user-managed kernel://User Managed Kernel = IP auto ip = xrt::ip(device, uuid, "Vadd_A_B");
- Optionally, for Versal AI Core devices with
AI Engine graph applications, you can also
specify the graph application to load at run time. This process requires a few
sub-tasks as shown:
- Add required header to
#include
statement:#include <experimental/xrt_aie.h>
- Identify the AI Engine graph from the
xrt::device
object, theuuid
of the loaded .xclbin, and thename
of the graph application:auto my_graph = xrt::graph(device, uuid, "mygraph_top");
- Reset and run the graph application from the software program as needed:
my_graph.reset(); std::cout << STR_PASSED << "my_graph.reset()" << std::endl; my_graph.run(0); std::cout << STR_PASSED << "my_graph.run()" << std::endl;
Tip: For more information on building and running AI Engine applications, refer to Versal ACAP AI Engine Programming Environment User Guide (UG1076). - Add required header to
- Create buffers for the IP arguments:
auto <buf_name> = xrt::bo(<device>,<DATA_SIZE>,<flag>,<bank_id>);
Where the buffer object constructor uses the following fields:
-
<device>
:xrt::device
object of the accelerator card. -
<DATA_SIZE>
: Size of the buffer as defined by the width and quantity of data. -
<flag>
: Flag for creating the buffer objects. -
<bank_id>
: Defines the memory bank on the device where the buffer should be allocated for IP access. The memory bank specified must match with the corresponding IP port's connection inside the .xclbin file. Otherwise you will getbad_alloc
when running the application. You can specify the assignment of the kernel argument using the--connectivity.sp
command as explained in Mapping Kernel Ports to Memory.
For example:
auto buf_in_a = xrt::bo(device,DATA_SIZE,xrt::bo::flags::normal,0); auto buf_in_b = xrt::bo(device,DATA_SIZE,xrt::bo::flags::normal,0);
Tip: Verify the IP connectivity to determine the specific memory bank, or you can get this information from the Vitis generated .xclbin.info file.For example, the following information for a user-managed kernel from the .xclbin could guide the construction of buffer objects in your host code:
Instance: Vadd_A_B_1 Base Address: 0x1c00000 Argument: scalar00 Register Offset: 0x10 Port: s_axi_control Memory: <not applicable> Argument: A Register Offset: 0x18 Port: m00_axi Memory: bank0 (MEM_DDR4) Argument: B Register Offset: 0x24 Port: m01_axi Memory: bank0 (MEM_DDR4)
-
- Get the buffer addresses and transfer data between host and device:
auto a_data = buf_in_a.map<int*>(); auto b_data = buf_in_b.map<int*>(); // Get the buffer physical address long long a_addr=buf_in_a.address(); long long b_addr=buf_in_b.address(); // Sync Buffers buf_in_a.sync(XCL_BO_SYNC_BO_TO_DEVICE); buf_in_b.sync(XCL_BO_SYNC_BO_TO_DEVICE);
xrt::bo::map()
allows mapping the host-side buffer backing pointer to a user pointer. However, before reading from the mapped pointer or after writing to the mapped pointer, you should usexrt::bo::sync()
with direction flag for the DMA operation. -
After preparing the buffer (buffer create, sync operation as shown above), you are free to pass all the necessary information to the IP with the direct register write operation.
Important: Thexrt::ip
differs from the standardxrt::kernel
, and indicates that XRT does not manage the IP but does provide access to read or write the registers.For example, the code below shows the information passing the buffer base address through the
xrt::ip::write_register()
command.ip.write_register(REG_OFFSET_A,a_addr); ip.write_register(REG_OFFSET_A+4,a_addr>>32); ip.write_register(REG_OFFSET_B,b_addr); ip.write_register(REG_OFFSET_B+4,b_addr>>32);
- Start the IP execution. Because the IP is user-managed, you can
employ any number of register write/read to control the start/check status/restart
the IP to trigger the execution of the IP. The following example uses an
s_axilite
interface to access control signals in the control register:uint32_t axi_ctrl = 0; std::cout << "INFO:IP Start" << std::endl; axi_ctrl = IP_START; ip.write_register(CSR_OFFSET, axi_ctrl); // Wait until the IP is DONE axi_ctrl =0; while((axi_ctrl & IP_IDLE) != IP_IDLE) { axi_ctrl = ip.read_register(CSR_OFFSET); }
- After IP execution is finished, you can transfer the data back to
host by the
xrt::bo::sync
command with the appropriate flag to dictate the buffer transfer direction.buf_in_b.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
- Optionally profile the application.
Because XRT is not in charge of starting or stopping the kernel, you cannot directly profile the operation of
user_managed
kernels as you would XRT managed kernels. However, you can use theuser_range
anduser_event
objects as discussed in Custom Profiling of the Host Application to profile elements of the host application. For example the following code captures the time it takes to write the registers from the host application:// Write Registers range.start("Phase 4a", "Write A Register"); ip.write_register(REG_OFFSET_A,a_addr); ip.write_register(REG_OFFSET_A+4,a_addr>>32); range.end(); range.start("Phase 4b", "Write B Register"); ip.write_register(REG_OFFSET_B,b_addr); ip.write_register(REG_OFFSET_B+4,b_addr>>32); range.end()
You can observe some aspects of the application and kernel operation in the Vitis analyzer as shown in the following figure.