Buffer Creation and Data Transfer - 2023.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
Release Date
2023.2 English

Interactions between the host program and hardware kernels rely on creating buffers and transferring data to and from the memory in the device. This process makes use of functions like clCreateBuffer and clEnqueueMigrateMemObjects.

Important: A single buffer cannot be bigger than 4 GB, yet to maximize throughput from the host to global memory, AMD also recommends keeping the buffer size at least 2 MB if possible.

There are two methods for allocating memory buffers, and transferring data:

  1. Letting XRT Allocate Buffers
  2. Using Host Pointer Buffers

In the case where XRT allocates the buffer, use enqueueMapBuffer to capture the buffer handle. In the second case, allocate the buffer directly with CL_MEM_USE_HOST_PTR, so you do not need to capture the handle.

Tip: Do not use CL_MEM_USE_HOST_PTR for embedded platforms. Embedded platforms require contiguous memory allocation and should use the CL_MEM_ALLOC_HOST_PTR method, as described in Letting XRT Allocate Buffers.

There are a number of coding practices you can adopt to maximize performance and fine-grain control. The OpenCL API supports additional commands for reading and writing buffers. For example, you can use clEnqueueWriteBuffer and clEnqueueReadBuffer commands in place of clEnqueueMigrateMemObjects. However, some of these commands have different effects that must be understood when using them. For example, clEnqueueReadBufferRect can read a rectangular region of a buffer object to the host application, but it does not transfer the data from the device global memory to the host. You must first use clEnqueueReadBuffer to transfer the data from the device global memory, and then use clEnqueueReadBufferRect to read the desired rectangular portion into the host application.