CL_MEM_USE_HOST_PTR is not recommended for embedded platforms.
Embedded platforms require contiguous memory allocation and should use the CL_MEM_ALLOC_HOST_PTR method, as described in Letting XRT Allocate Buffers.There are two main parts of a cl_mem object:
host side pointer and device side pointer. Before the kernel starts its operation, the device
side pointer is implicitly allocated on the device side memory (for example, on a specific
location inside the device global memory) and the buffer becomes a resident on the device.
Using clEnqueueMigrateMemObjects this allocation and data
transfer occur upfront, much ahead of the kernel execution. This especially helps to enable
software pipelining if the host is executing the same
kernel multiple times, because data transfer for the next transaction can happen when kernel
is still operating on the previous data set, and thus hide the data transfer latency of
successive kernel executions.
The OpenCL framework provides a number of
APIs for transferring data between the host and the device. Typically, data movement APIs,
such as clEnqueueWriteBuffer and clEnqueueReadBuffer, implicitly migrate memory objects to the device after they
are enqueued. They do not guarantee when the data is transferred, and this makes it difficult
for the host application to synchronize the movement of memory objects with the computation
performed on the data.
AMD recommends using
clEnqueueMigrateMemObjects
instead of
clEnqueueWriteBuffer or clEnqueueReadBuffer to improve the performance. Using this API, memory migration
can be explicitly performed ahead of the dependent commands. This allows the host application
to preemptively change the association of a memory object, through regular command queue
scheduling, to prepare for another upcoming command. This also permits an application to
overlap the placement of memory objects with other unrelated operations before these memory
objects are needed, potentially hiding or reducing data transfer latencies. After the event
associated with clEnqueueMigrateMemObjects has been marked
complete, the host program knows the memory objects have been successfully migrated.
clEnqueueMigrateMemObjects is that it can migrate multiple memory
objects in a single API call. This reduces the overhead of scheduling and calling functions to
transfer data for more than one memory object.The following code shows the use of clEnqueueMigrateMemObjects:
int host_mem_ptr[MAX_LENGTH]; // host memory for input vector
// Fill the memory input
for(int i=0; i<MAX_LENGTH; i++) {
host_mem_ptr[i] = <... >
}
cl_mem dev_mem_ptr = clCreateBuffer(context,
CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
sizeof(int) * number_of_words, host_mem_ptr, NULL);
clSetKernelArg(kernel, 0, sizeof(cl_mem), &dev_mem_ptr);
err = clEnqueueMigrateMemObjects(commands, 1, dev_mem_ptr, 0, 0,
NULL, NULL);