The first read or write request to global memory is expensive, but subsequent contiguous operations are not. Transferring data in bursts hides the memory access latency and improves bandwidth usage and efficiency of the memory controller.
Atomic accesses to global memory should always be avoided unless absolutely
required. The load and store functions should be coded to always infer bursting
transaction. This can be done using a memcpy
operation
as shown in the vadd.cpp file in the Vitis Accel Examples, or by creating a tight for
loop accessing all the required values sequentially, as
explained in Developing Applications.