Consider a kernel that produces different amounts of data depending on the
input to the kernel. For example, a compression engine where the output size varies
depending on the input data pattern and similarity. The host can still read the whole
output buffer by using clEnqueueMigrateMemObjects, but
that is a suboptimal approach as more than the required memory transfer would occur.
Ideally, the host program only reads the exact amount of data that the kernel has
written.
One technique is to have the kernel write the amount of the output data at
the start of writing the output data. The host application can use
clEnqueueReadBuffer two times, first to read the amount of
data being returned, and second to read exact amount of data returned by the kernel
based on the information from the first
read.clEnqueueReadBuffer(command_queue,device_write_ptr, CL_FALSE, 0, sizeof(int) * 1,
&kernel_write_size, 0, nullptr, &size_read_event);
clEnqueueReadBuffer(command_queue,device_write_ptr, CL_FALSE, DATA_READ_OFFSET,
kernel_write_size, host_ptr, 1, &size_read_event, &data_read_event);
With
clEnqueueMigrateMemObject, which is
recommended over clEnqueueReadBuffer or clEnqueueWriteBuffer, you can adopt a similar approach by
using sub-buffers. This is shown in the following code sample. Tip: The code
sample shows only partial commands to demonstrate the concept.
//Create a small sub-buffer to read the quantity of data
cl_buffer_region buffer_info_1={0,1*sizeof(int)};
cl_mem size_info = clCreateSubBuffer (device_write_ptr, CL_MEM_WRITE_ONLY,
CL_BUFFER_CREATE_TYPE_REGION, &buffer_info_1, &err);
// Map the sub-buffer into the host space
auto size_info_host_ptr = clEnqueueMapBuffer(queue, size_info,,,, );
// Read only the sub-buffer portion
clEnqueueMigrateMemObjects(queue, 1, &size_info, CL_MIGRATE_MEM_OBJECT_HOST,,,);
// Retrive size information from the already mapped size_info_host_ptr
kernel_write_size = ...........
// Create sub-buffer to read the required amount of data
cl_buffer_region buffer_info_2={DATA_READ_OFFSET, kernel_write_size};
cl_mem buffer_seg = clCreateSubBuffer (device_write_ptr, CL_MEM_WRITE_ONLY,
CL_BUFFER_CREATE_TYPE_REGION, &buffer_info_2,&err);
// Map the subbuffer into the host space
auto read_mem_host_ptr = clEnqueueMapBuffer(queue, buffer_seg,,,);
// Migrate the subbuffer
clEnqueueMigrateMemObjects(queue, 1, &buffer_seg, CL_MIGRATE_MEM_OBJECT_HOST,,,);
// Now use the read data from already mapped read_mem_host_ptr