The xfcvDataMovers class provides a high level API abstraction to initiate data transfer from the DDR to the AIE core and vice versa for hw-emulation / hw runs. Because each AIE core has limited local memory which is not sufficient to fit an entire high resolution image (input / output), each image needs to be partitioned into smaller tiles and then send to AIE core for computation. After computation the tiled image at output is stitched back to generate the high resolution image at the output. This process involves complex computation as tiling needs to ensure proper border handling and overlap processing in case of convolution based kernels.
The xfcvDataMovers class object takes some simple, user provided parameters and provides a simple data transaction API where you do not have to consider the complexity. Moreover it provides a template parameter, using which, the application can switch from PL-based data movement to GMIO-based (and vice versa) seamlessly.
Parameter | Description |
KIND | Type of object TILER / STITCHER |
DATA_TYPE | Data type of AIE core kernel input or output |
TILE_HEIGHT_MAX | Maximum tile height |
TILE_WIDTH_MAX | Maximum tile width |
AIE_VECTORIZATION_FACTOR | AIE core vectorization factor |
CORES | Number of AIE cores to be used |
PL_AXI_BITWIDTH | For PL based data movers. It is the data width for AXI transfers between DDR - PL |
USE_GMIO | Set to true to use GMIO based data transfer |
Parameter | Description |
overlapH | Horizontal overlap of the AIE core / pipeline |
overlapV | Vertical overlap of the AIE core / pipeline |
Note
Horizontal overlap and Vertical overlaps should be computed for the complete pipeline. For example if the pipeline has a single 3x3 2D filter then overlap sizes (both horizontal and vertical) will be 1. However in the case of two such filter operations which are back to back, the overlap size will be 2. Currently it is expected that users provide this input correctly.
The data transfer using the xfcvDataMovers class can be done in one of two ways:
PLIO data movers
This is the default mode for xfcvDataMovers class operation. When this method is used, data is transferred using hardware Tiler / Stitcher IPs provided by AMD. The Makefile provided with design examples shipped with the library provide the locations of the .xo files for these IPs. It also shows how to incorporate them in the Vitis Build System. You need to create an object of xfcvDataMovers class per input / output image as shown in following code.
Important
The implementations of Tiler and Stitcher for PLIO are provided as .xo files in the ‘L1/lib/hw’ folder. By using these files, you are agreeing to the terms and conditions specified in the LICENSE.txt file available in the same directory.
int overlapH = 1; int overlapV = 1; xF::xfcvDataMovers<xF::TILER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR> tiler(overlapH, overlapV); xF::xfcvDataMovers<xF::STITCHER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR> stitcher;
The choice of MAX_TILE_HEIGHT / MAX_TILE_WIDTH provides constraints on the image tile size which in turn governs local memory usage. The image tile size in bytes can be computed as follows.
Image tile size = (TILE_HEADER_SIZE_IN_BYTES + MAX_TILE_HEIGHT*MAX_TILE_WIDTH*sizeof(DATA_TYPE))
Here TILE_HEADER_SIZE_IN_BYTES is 128 bytes for the current version of Tiler / Stitcher. DATA_TYPE in above example is int16_t (2 bytes).o
Note
The current version of HW data movers have 8_16 configuration (i.e., an 8-bit image element data type on the host side and a 16-bit image element data type on the AIE kernel side). In future more such configurations will be provided (example: 8_8 / 16_16 etc.).
Tiler / Stitcher IPs use PL resources available on VCK boards. For 8_16 configuration, the following table illustrates resource utilization numbers for these IPs. The numbers correspond to a single instance of each IP.
LUTs FFs BRAMs DSPs Fmax Tiler 2761 3832 5 13 400 MHz Stitcher 2934 3988 5 7 400 MHz Total 5695 7820 10 20 GMIO data movers
Transition to GMIO-based data movers can be achieved by using a specialized template implementation of the above class. All above constraints with regard to the image tile size calculation are valid here as well. Sample code is shown below.
xF::xfcvDataMovers<xF::TILER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR, 1, 0, true> tiler(1, 1); xF::xfcvDataMovers<xF::STITCHER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR, 1, 0, true> stitcher;
Note
The last template parameter is set to true, implying GMIO specialization.
Once the objects are constructed, simple API calls can be made to initiate the data transfers. Sample code is shown below.
//For PLIO
auto tiles_sz = tiler.host2aie_nb(&src_hndl, srcImageR.size());
stitcher.aie2host_nb(&dst_hndl, dst.size(), tiles_sz);
//For GMIO
auto tiles_sz = tiler.host2aie_nb(srcData.data(), srcImageR.size(), {"gmioIn[0]"});
stitcher.aie2host_nb(dstData.data(), dst.size(), tiles_sz, {"gmioOut[0]"});
Note
GMIO data transfers take an additional argument which is the corresponding GMIO port to be used.
Note
For GMIO-based transfers, there is a blocking method as well (host2aie(…) / aie2host(…)). For PLIO-based data transfers only non-blocking API calls are provided.
Using tile_sz
, you can run the graph the appropriate number of times.
filter_graph_hndl.run(tiles_sz[0] * tiles_sz[1]);
After the runs are started, you need to wait for all transactions to complete.
filter_graph_hndl.wait();
tiler.wait();
stitcher.wait();