Reading a Specific Portion from the Device Buffer - 2020.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2021-03-22
Version
2020.2 English

Consider a kernel that produces different amounts of data depending on the input to the kernel. For example, a compression engine where the output size varies depending on the input data pattern and similarity. The host can still read the whole output buffer by using clEnqueueMigrateMemObjects, but that is a suboptimal approach as more than the required memory transfer would occur. Ideally the host program should only read the exact amount of data that the kernel has written.

One technique is to have the kernel write the amount of the output data at the start of writing the output data. The host application can use clEnqueueReadBuffer two times, first to read the amount of data being returned, and second to read exact amount of data returned by the kernel based on the information from the first read.
clEnqueueReadBuffer(command_queue,device_write_ptr, CL_FALSE, 0, sizeof(int) * 1, 
                    &kernel_write_size, 0, nullptr, &size_read_event);
clEnqueueReadBuffer(command_queue,device_write_ptr, CL_FALSE, DATA_READ_OFFSET, 
                    kernel_write_size, host_ptr, 1, &size_read_event, &data_read_event);
With clEnqueueMigrateMemObject, which is recommended over clEnqueueReadBuffer or clEnqueueWriteBuffer, you can adopt a similar approach by using sub-buffers. This is shown in the following code sample.
Tip: The code sample shows only partial commands to demonstrate the concept.
//Create a small sub-buffer to read the quantity of data
cl_buffer_region buffer_info_1={0,1*sizeof(int)}; 
cl_mem size_info = clCreateSubBuffer (device_write_ptr, CL_MEM_WRITE_ONLY, 
      CL_BUFFER_CREATE_TYPE_REGION, &buffer_info_1, &err);

// Map the sub-buffer into the host space
auto size_info_host_ptr = clEnqueueMapBuffer(queue, size_info,,,, );

// Read only the sub-buffer portion
clEnqueueMigrateMemObjects(queue, 1, &size_info, CL_MIGRATE_MEM_OBJECT_HOST,,,);
                          
// Retrive size information from the already mapped size_info_host_ptr
kernel_write_size = ........... 

// Create sub-buffer to read the required amount of data     
cl_buffer_region buffer_info_2={DATA_READ_OFFSET, kernel_write_size};
cl_mem  buffer_seg = clCreateSubBuffer (device_write_ptr, CL_MEM_WRITE_ONLY, 
      CL_BUFFER_CREATE_TYPE_REGION, &buffer_info_2,&err);

// Map the subbuffer into the host space
auto read_mem_host_ptr = clEnqueueMapBuffer(queue, buffer_seg,,,);

// Migrate the subbuffer
clEnqueueMigrateMemObjects(queue, 1, &buffer_seg, CL_MIGRATE_MEM_OBJECT_HOST,,,);

// Now use the read data from already mapped read_mem_host_ptr