Controlling Data Transfers between AI Engine and Global Memory - 2024.2 English - UG1076

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2024-11-28
Version
2024.2 English

The AI Engine graph allows memory-mapped connections to and from the global memory. The global memory can be either High Bandwidth Memory (HBM) on the device or DDR memory external to the device. Linear access to data between AI Engine and global memory through memory-mapped connections is possible through GMIO objects. To learn how to use GMIO in the AI Engine graph code, refer to Configuring input_gmio/output_gmio in the AI Engine Kernel and Graph Programming Guide (UG1079).

An alternative to GMIO objects is supported in AI Engine-ML devices. The alternative to GMIO objects for memory-mapped connections between global memory AI Engine, in the graph, is external buffer objects. Unlike GMIOs, which access data in a linear manner, external buffers can use advanced and intricate data access patterns, using tiling parameters, in the graph. For more information on external buffer usage, refer to AI Engine-ML External Memory Access in the AI Engine-ML Kernel and Graph Programming Guide (UG1603).

The host code to control data transfers remains the same whether using GMIOs or external buffers to access data in global memory. Both methods support synchronous and asynchronous data transfer. For asynchronous data transfer, the async API of the xrt::aie::bo object manages the transfer and requires the GMIO or external buffer object name to be specified as the first parameter.

Following example code shows an asynchronous buffer transfer. This can be performed using either GMIO or external buffers with XRT API:
char* xclbinFilename = argv[1];
// Open xclbin
auto device = xrt::device(0); //device index=0
auto uuid = device.load_xclbin(xclbinFilename);

//Only non-cacheable buffer is supported
auto din_buffer = xrt::aie::bo (device, BLOCK_SIZE_in_Bytes, xrt::bo::flags::normal, /*memory group*/0); 
int* dinArray= din_buffer.map<int*>();

//Only non-cacheable buffer is supported
auto dout_buffer = xrt::aie::bo (device, BLOCK_SIZE_in_Bytes, xrt::bo::flags::normal, /*memory group*/0); 
int* doutArray= dout_buffer.map<int*>();

int ret=0;
int error=0;

//Initialization
for(int i=0;i<ITERATION*1024/4;i++){
  dinArray[i]=i;
}

// Parameter "gr.gmPortIn" can be the name of a GMIO/external buffer object
din_buffer.async("gr.gmPortIn",XCL_BO_SYNC_BO_GMIO_TO_AIE,BLOCK_SIZE_in_Bytes,/*offset*/0);

auto ghdl=xrt::graph(device,uuid,"gr");
ghdl.run(ITERATION);

// Parameter "gr.gmPortOut" can be the name of a GMIO/external buffer object
auto out_buffer_run=dout_buffer.async("gr.gmPortOut",XCL_BO_SYNC_BO_AIE_TO_GMIO,BLOCK_SIZE_in_Bytes,/*offset*/0);

ghdl.wait();//Wait for graph to complete
dout_buffer_run.wait();//Wait for gmioOut to complete

// Post-processing
...
Note: Only non-cacheable buffers are supported for AI Engine GMIO and external buffer buffers.
XRT API introduces the xrt::aie::buffer object to represent the buffer of the AI Engine. The following code uses this object to perform asynchronous buffer transactions similar to the code above:
//Only non-cacheable buffer is supported
auto din_buffer = xrt::aie::bo (device, BLOCK_SIZE_in_Bytes, xrt::bo::flags::normal, /*memory group*/0); 

//Only non-cacheable buffer is supported
auto dout_buffer = xrt::aie::bo (device, BLOCK_SIZE_in_Bytes, xrt::bo::flags::normal, /*memory group*/0); 

//"gr.gmioIn" is the name of the buffer which represents GMIO/External Buffer
xrt::aie::buffer bufIn(device, uuid, "gr.gmioIn");
bufIn.async(din_buffer, XCL_BO_SYNC_BO_GMIO_TO_AIE, BLOCK_SIZE_in_Bytes, 0);

xrt::aie::buffer bufOut(device, uuid, "gr.gmioOut");
bufOut.async(dout_buffer, XCL_BO_SYNC_BO_AIE_TO_GMIO, BLOCK_SIZE_in_Bytes, 0);

bufOut.wait();

For more details on the buffer object API support, refer to the xrt_aie.h file in the XRT Repository.