For x86Simulation / AIE simulation, top level application had simple ADF API calls to initialize / run / end the graph. However, for actual AI Engine graph applications the host code must do much more than those simple tasks. The top-level PS application running on the Cortex®-A72, controls the graph and PL kernels: manage data inputs to the graph, handle data outputs from the graph, and control any PL kernels working with the graph. Sample code is illustrated below
1.// Open device, load xclbin, and get uuid
auto dhdl = xrtDeviceOpen(0);//device index=0
xrtDeviceLoadXclbinFile(dhdl,xclbinFilename);
xuid_t uuid;
xrtDeviceGetXclbinUUID(dhdl, uuid);
adf::registerXRT(dhdl, uuid);
2. Allocate output buffer objects and map to host memory
xrtBufferHandle out_bohdl = xrtBOAlloc(dhdl, output_size_in_bytes, 0, /*BANK=*/0);
std::complex<short> *host_out = (std::complex<short>*)xrtBOMap(out_bohdl);
3. Get kernel and run handles, set arguments for kernel, and launch kernel.
xrtKernelHandle s2mm_khdl = xrtPLKernelOpen(dhdl, top->m_header.uuid, "s2mm"); // Open kernel handle
xrtRunHandle s2mm_rhdl = xrtRunOpen(s2mm_khdl);
xrtRunSetArg(s2mm_rhdl, 0, out_bohdl); // set kernel arg
xrtRunSetArg(s2mm_rhdl, 2, OUTPUT_SIZE); // set kernel arg
xrtRunStart(s2mm_rhdl); //launch s2mm kernel
// ADF API:run, update graph parameters (RTP) and so on
gr.init();
gr.update(gr.size, 1024);//update RTP
gr.run(16);//start AIE kernel
gr.wait();
4. Wait for kernel completion.
auto state = xrtRunWait(s2mm_rhdl);
5. Sync output device buffer objects to host memory.
xrtBOSync(out_bohdl, XCL_BO_SYNC_BO_FROM_PLATFORM , output_size_in_bytes,/*OFFSET=*/ 0);
//6. post-processing on host memory - "host_out
Vitis Vision AIE library functions provide optimal vector implementations of various computer vision algorithms. These functions are expected to process high resolution images. However because local memory of AIE core module is limited, entire image can’t be fit into it. Also accessing DDR for reading / writing image data will be highly inefficient both for performance and power. To overcome this limitation host code is expected to split the high resolution image into smaller tiles which fit in AIE Engine local memory in ping-pong fashion. Splitting of high resolution image in smaller tiles is a complex operation as it need to be aware of overlap regions and borders. Also the tile size is expected to be aligned with vectorization factor of the kernel.
To facilitate this Vitis Vision Library provides data movers which perform smart tiling / stitching of high resolution images which can meet all above requirements. There are two versions made available which can provide data movement capabilities both using PLIO and GMIO interfaces. A high level class abstraction is provided with simple API interface to facilitate data transfers. The class abstraction allows seamless transition between PLIO - GMIO methods of data transfers.
Important
For HW emulation / HW run it is imperative to include graph.cpp inside host.cpp. This is because platform port specification and ADF graph object instance is declared in graph.cpp.