Consider a kernel that produces different amounts of data depending on the input
to the kernel. For example, a compression engine where the output size varies depending
on the input data pattern and similarity. The host can still read the whole output
buffer by using clEnqueueMigrateMemObjects
, but that is
a suboptimal approach as more than the required memory transfer would occur. Ideally the
host program should only read the exact amount of data that the kernel has written.
One technique is to have the kernel write the amount of the output data at
the start of writing the output data. The host application can use
clEnqueueReadBuffer
two times, first to read the amount of
data being returned, and second to read exact amount of data returned by the kernel
based on the information from the first
read.clEnqueueReadBuffer(command_queue,device_write_ptr, CL_FALSE, 0, sizeof(int) * 1,
&kernel_write_size, 0, nullptr, &size_read_event);
clEnqueueReadBuffer(command_queue,device_write_ptr, CL_FALSE, DATA_READ_OFFSET,
kernel_write_size, host_ptr, 1, &size_read_event, &data_read_event);
With
clEnqueueMigrateMemObject
, which is
recommended over clEnqueueReadBuffer
or clEnqueueWriteBuffer
, you can adopt a similar approach by
using sub-buffers. This is shown in the following code sample. Tip: The code
sample shows only partial commands to demonstrate the concept.
//Create a small sub-buffer to read the quantity of data
cl_buffer_region buffer_info_1={0,1*sizeof(int)};
cl_mem size_info = clCreateSubBuffer (device_write_ptr, CL_MEM_WRITE_ONLY,
CL_BUFFER_CREATE_TYPE_REGION, &buffer_info_1, &err);
// Map the sub-buffer into the host space
auto size_info_host_ptr = clEnqueueMapBuffer(queue, size_info,,,, );
// Read only the sub-buffer portion
clEnqueueMigrateMemObjects(queue, 1, &size_info, CL_MIGRATE_MEM_OBJECT_HOST,,,);
// Retrive size information from the already mapped size_info_host_ptr
kernel_write_size = ...........
// Create sub-buffer to read the required amount of data
cl_buffer_region buffer_info_2={DATA_READ_OFFSET, kernel_write_size};
cl_mem buffer_seg = clCreateSubBuffer (device_write_ptr, CL_MEM_WRITE_ONLY,
CL_BUFFER_CREATE_TYPE_REGION, &buffer_info_2,&err);
// Map the subbuffer into the host space
auto read_mem_host_ptr = clEnqueueMapBuffer(queue, buffer_seg,,,);
// Migrate the subbuffer
clEnqueueMigrateMemObjects(queue, 1, &buffer_seg, CL_MIGRATE_MEM_OBJECT_HOST,,,);
// Now use the read data from already mapped read_mem_host_ptr