CL_MEM_USE_HOST_PTR
is not recommended for embedded platforms.
Embedded platforms require contiguous memory allocation and should use the CL_MEM_ALLOC_HOST_PTR
method, as described in Letting XRT Allocate Buffers.There are two main parts of a cl_mem
object:
host side pointer and device side pointer. Before the kernel starts its operation, the device
side pointer is implicitly allocated on the device side memory (for example, on a specific
location inside the device global memory) and the buffer becomes a resident on the device.
Using clEnqueueMigrateMemObjects
this allocation and data
transfer occur upfront, much ahead of the kernel execution. This especially helps to enable
software pipelining if the host is executing the same
kernel multiple times, because data transfer for the next transaction can happen when kernel
is still operating on the previous data set, and thus hide the data transfer latency of
successive kernel executions.
The OpenCL framework provides a number of
APIs for transferring data between the host and the device. Typically, data movement APIs,
such as clEnqueueWriteBuffer
and clEnqueueReadBuffer
, implicitly migrate memory objects to the device after they
are enqueued. They do not guarantee when the data is transferred, and this makes it difficult
for the host application to synchronize the movement of memory objects with the computation
performed on the data.
AMD recommends using
clEnqueueMigrateMemObjects
instead of
clEnqueueWriteBuffer
or clEnqueueReadBuffer
to improve the performance. Using this API, memory migration
can be explicitly performed ahead of the dependent commands. This allows the host application
to preemptively change the association of a memory object, through regular command queue
scheduling, to prepare for another upcoming command. This also permits an application to
overlap the placement of memory objects with other unrelated operations before these memory
objects are needed, potentially hiding or reducing data transfer latencies. After the event
associated with clEnqueueMigrateMemObjects
has been marked
complete, the host program knows the memory objects are successfully migrated.
clEnqueueMigrateMemObjects
is that it can migrate multiple memory
objects in a single API call. This reduces the overhead of scheduling and calling functions to
transfer data for more than one memory object.The following code shows the use of clEnqueueMigrateMemObjects
:
int host_mem_ptr[MAX_LENGTH]; // host memory for input vector
// Fill the memory input
for(int i=0; i<MAX_LENGTH; i++) {
host_mem_ptr[i] = <... >
}
cl_mem dev_mem_ptr = clCreateBuffer(context,
CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
sizeof(int) * number_of_words, host_mem_ptr, NULL);
clSetKernelArg(kernel, 0, sizeof(cl_mem), &dev_mem_ptr);
err = clEnqueueMigrateMemObjects(commands, 1, dev_mem_ptr, 0, 0,
NULL, NULL);