Host code host.cpp
will be running on the host processor, which contains the code to initialize and run the datamovers and the ADF graph. XRT APIs are
used to create the required buffers in the device memory.
First a golden reference image is generated using OpenCV.
int run_opencv_ref(cv::Mat& srcImageR, cv::Mat& dstRefImage, float coeff[9]) {
cv::Mat tmpImage;
cv::Mat kernel = cv::Mat(3, 3, CV_32F, coeff);
cv::filter2D(srcImageR, dstRefImage, -1, kernel, cv::Point(-1, -1), 0, cv::BORDER_REPLICATE);
return 0;
}
Then, xclbin
is loaded on the device and the device handles are created.
xF::deviceInit(xclBinName);
Buffers for input and output data are created using the XRT APIs and data from input CV::Mat
is copied to the XRT buffer.
void* srcData = nullptr;
xrt::bo src_hndl = xrt::bo(xF::gpDhdl, (srcImageR.total() * srcImageR.elemSize()), 0, 0);
srcData = src_hndl.map();
memcpy(srcData, srcImageR.data, (srcImageR.total() * srcImageR.elemSize()));
// Allocate output buffer
void* dstData = nullptr;
xrt::bo ptr_dstHndl = xrt::bo(xF::gpDhdl, (op_height * op_width * srcImageR.elemSize()), 0, 0);
dstData = ptr_dstHndl.map();
cv::Mat dst(op_height, op_width, srcImageR.type(), dstData);
xfcvDataMovers
objects tiler and stitcher are created. For more details on xfcvDataMovers
refer to xfcvDataMovers
xF::xfcvDataMovers<xF::TILER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR> tiler(1, 1);
xF::xfcvDataMovers<xF::STITCHER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR> stitcher;
ADF graph is initialized and the filter coefficients are updated.
auto gHndl = xrt::graph(xF::gpDhdl, xF::xclbin_uuid, "filter_graph");
gHndl.reset();
gHndl.update("filter_graph.k1.in[1]", float2fixed_coeff<10, 16>(kData));
Metadata containing the tile information is generated.
tiler.compute_metadata(srcImageR.size());
The data transfer to AIE via datamovers is initiated along with graph run. Further execution waits until the data transfer is complete.
auto tiles_sz = tiler.host2aie_nb(&src_hndl, srcImageR.size());
stitcher.aie2host_nb(&dst_hndl, dst.size(), tiles_sz);
gHndl.run(tiles_sz[0] * tiles_sz[1]);
gHndl.wait();
tiler.wait();
stitcher.wait();