In this step, you will see how to asynchronously transfer output data with non-blocking GMIO API, and how to use GMIO::wait to perform data synchronization. In addition, you will see how to run the AI Engine program with GMIO in hardware.
Change the working directory to single_aie_gmio/step3. Examine aie/graph.cpp. The main difference in code is as follows:
gr.gmioIn.gm2aie_nb(dinArray,BLOCK_SIZE_in_Bytes);//Transfer all blocks input data at a time
gr.run(ITERATION);
gr.gmioOut.aie2gm_nb(doutArray,BLOCK_SIZE_in_Bytes);//Transfer all blocks output data at a time
//PS can do other tasks here when data is transferring
gr.gmioOut.wait();
Note:
gr.gmioOut.aie2gm_nb()will return immediately after it has been called without waiting for the data transfer to be completed. PS can do other tasks after non-blocking API call when data is being transferred. Then, it needsgr.gmioOut.wait();to do the data synchronization. AfterGMIO::wait, the output data is in memory and can be processed by the host application.
To make GMIO work in hardware flow, examine sw/host.cpp. It uses XRT API instead:
auto din_buffer = xrt::aie::bo (device, BLOCK_SIZE_in_Bytes,xrt::bo::flags::normal, /*memory group*/0); //Only non-cacheable buffer is supported
int* dinArray= din_buffer.map<int*>();
auto dout_buffer = xrt::aie::bo (device, BLOCK_SIZE_in_Bytes,xrt::bo::flags::normal, /*memory group*/0); //Only non-cacheable buffer is supported
int* doutArray= dout_buffer.map<int*>();
std::cout<<"GMIO::malloc completed"<<std::endl;
......
auto ghdl=xrt::graph(device,uuid,"gr");
din_buffer.async("gr.gmioIn",XCL_BO_SYNC_BO_GMIO_TO_AIE,BLOCK_SIZE_in_Bytes,/*offset*/0);
ghdl.run(ITERATION);
auto dout_buffer_run=dout_buffer.async("gr.gmioOut",XCL_BO_SYNC_BO_AIE_TO_GMIO,BLOCK_SIZE_in_Bytes,/*offset*/0);
//PS can do other tasks here when data is transferring
dout_buffer_run.wait();//Wait for gmioOut to complete