Setting Up User-Managed Kernels and Argument Buffers - 2022.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2022-05-25
Version
2022.1 English
Tip: For an example of a host application working with user-managed RTL kernel refer to Vitis-Tutorials/Hardware_Acceleration/Feature_Tutorials/01-rtl-kernel-workflow.

User-managed kernels require the use of the XRT native API for the host application, and are specified as an IP object of the xrt::ip class. The following is a high-level overview of how to structure your host application to access user-managed kernels from an .xclbin file.

  1. Add the following header files to include the XRT native API:
    #include "experimental/xrt_ip.h"
    #include "xrt/xrt_bo.h"
    
    • experimental/xrt_ip.h: Defines the IP as an object of xrt::ip.
    • xrt/xrt_bo.h: Lets you create buffer objects in the XRT native API.
  2. Set up the application environment as described in Specifying the Device ID and Loading the XCLBIN.
  3. The IP object (xrt::ip) is constructed from the xrt::device object, the uuid of the .xclbin, and the name of the user-managed kernel. The xrt::ip differs from the standard xrt::kernel, and indicates that XRT does not manage the IP but does provide access to registers:
    //User Managed Kernel = IP
    auto ip = xrt::ip(device, uuid, "Vadd_A_B");
  4. Create buffers for the IP arguments:
    auto <buf_name> = xrt::bo(<device>,<DATA_SIZE>,<flag>,<bank_id>);

    Where the buffer object constructor uses the following fields:

    • <device>: xrt::device object of the accelerator card.
    • <DATA_SIZE>: Size of the buffer as defined by the width and quantity of data.
    • <flag>: Flag for creating the buffer objects.
    • <bank_id>: Defines the memory bank on the device where the buffer should be allocated for IP access. The memory bank specified must match with the corresponding IP port's connection inside the .xclbin file. Otherwise you will get bad_alloc when running the application. You can specify the assignment of the kernel argument using the --connectivity.sp command as explained in Mapping Kernel Ports to Memory.

    For example:

    auto buf_in_a = xrt::bo(device,DATA_SIZE,xrt::bo::flags::normal,0);
    auto buf_in_b = xrt::bo(device,DATA_SIZE,xrt::bo::flags::normal,0);
    
    Tip: Verify the IP connectivity to determine the specific memory bank, or you can get this information from the Vitis generated .xclbin.info file.

    For example, the following information for a user-managed kernel from the .xclbin could guide the construction of buffer objects in your host code:

    Instance:        Vadd_A_B_1
       Base Address: 0x1c00000
    
       Argument:          scalar00
       Register Offset:   0x10
       Port:              s_axi_control
       Memory:            <not applicable>
    
       Argument:          A
       Register Offset:   0x18
       Port:              m00_axi
       Memory:            bank0 (MEM_DDR4)
    
       Argument:          B
       Register Offset:   0x24
       Port:              m01_axi
       Memory:            bank0 (MEM_DDR4)
    
  5. Get the buffer addresses and transfer data between host and device:
        auto a_data = buf_in_a.map<int*>();
        auto b_data = buf_in_b.map<int*>();
    
        // Get the buffer physical address
        long long a_addr=buf_in_a.address();
        long long b_addr=buf_in_b.address();
    
        // Sync Buffers
        buf_in_a.sync(XCL_BO_SYNC_BO_TO_DEVICE);
        buf_in_b.sync(XCL_BO_SYNC_BO_TO_DEVICE);
    

    xrt::bo::map() allows mapping the host-side buffer backing pointer to a user pointer. However, before reading from the mapped pointer or after writing to the mapped pointer, you should use xrt::bo::sync() with direction flag for the DMA operation.

  6. After preparing the buffer (buffer create, sync operation as shown above), you are free to pass all the necessary information to the IP with the direct register write operation. For example, the code below shows the information passing the buffer base address through the xrt::ip::write_register() command.

    Then write to the registers to move data from the host application to the kernel:
        ip.write_register(REG_OFFSET_A,a_addr);
        ip.write_register(REG_OFFSET_A+4,a_addr>>32);
    
        ip.write_register(REG_OFFSET_B,b_addr);
        ip.write_register(REG_OFFSET_B+4,b_addr>>32);
    
  7. Start the IP execution. Because the IP is user-managed, you can employ any number of register write/read to control the start/check status/restart the IP to trigger the execution of the IP. The following example uses an s_axilite interface to access control signals in the control register:
        uint32_t axi_ctrl = 0;
        std::cout << "INFO:IP Start" << std::endl;
        axi_ctrl = IP_START;
        ip.write_register(CSR_OFFSET, axi_ctrl);
    
        // Wait until the IP is DONE 
        axi_ctrl =0;
        while((axi_ctrl & IP_IDLE) != IP_IDLE) {
            axi_ctrl = ip.read_register(CSR_OFFSET);
        }
     
  8. After IP execution is finished, you can transfer the data back to host by the xrt::bo::sync command with the appropriate flag to dictate the buffer transfer direction.
        buf_in_b.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
    
  9. Optionally profile the application.

    Because XRT is not in charge of starting or stopping the kernel, you cannot directly profile the operation of user_managed kernels as you would XRT managed kernels. However, you can use the user_range and user_event objects as discussed in Custom Profiling of the Host Application to profile elements of the host application. For example the following code captures the time it takes to write the registers from the host application:

        // Write Registers
        range.start("Phase 4a", "Write A Register");
        ip.write_register(REG_OFFSET_A,a_addr);
        ip.write_register(REG_OFFSET_A+4,a_addr>>32);
        range.end();
        range.start("Phase 4b", "Write B Register");
        ip.write_register(REG_OFFSET_B,b_addr);
        ip.write_register(REG_OFFSET_B+4,b_addr>>32);
        range.end()
    You can observe some aspects of the application and kernel operation in the Vitis analyzer as shown in the following figure.