xfcvDataMovers class provides a high level API abstraction to initiate data transfer from DDR to AIE core and vice versa for hw-emulation / hw runs. Because each AIE core has limited amount of local memory which is not sufficient to fit in entire high resolution images (input / output), each image needs to be partitioned into smaller tiles and then send to AIE core for computation. After computation the tiled image at output is stitched back to generate the high resolution image at the output. This process involves complex computation as tiling needs to ensure proper border handling and overlap processing in case of convolution based kernels.
xfcvDataMovers class object takes input some simple parameters from users and provides a simple data transaction API where user does not have to bother about the complexity. Moreover it provides a template parameter using which application can switch from PL based data movement to GMIO based (and vice versa) seamlessly.
Parameter | Description |
KIND | Type of object TILER / STITCHER |
DATA_TYPE | Data type of AIE core kernel input or output |
TILE_HEIGHT_MAX | Maximum tile height |
TILE_WIDTH_MAX | Maximum tile width |
AIE_VECTORIZATION_FACTOR | AIE core vectorization factor |
CORES | Number of AIE cores to be used |
PL_AXI_BITWIDTH | For PL based data movers. It is the data width for AXI transfers between DDR - PL |
USE_GMIO | Set to true to use GMIO based data transfer |
Parameter | Description |
overlapH | Horizontal overlap of the AIE core / pipeline |
overlapV | Vertical overlap of the AIE core / pipeline |
Note
Horizontal overlap and Vertical overlaps should be computed for the complete pipeline. For example if the pipeline has a single 3x3 2D filter then overlap sizes (both horizontal and vertical) will be 1. However in case of two such filter operations which are back to back the overlap size will be 2. Currently if it is expected from users to provide this input correctly.
The data transfer using xfcvDataMovers class can be done in one out of 2 ways.
PLIO data movers
This is the default mode for xfcvDataMovers class operation. When this method is used, data is transferred using hardware Tiler / Stitcher IPs provided by Xilinx. The Makefile provided with designs examples shipped with the library provide location to .xo files for these IP’s. It also shows how to incorporate them in Vitis Build System. Having said that, user needs to create an object of xfcvDataMovers class per input / output image as shown in code below
Important
The implementations of Tiler and Stitcher for PLIO, are provided as .xo files in ‘L1/lib/hw’ folder. By using these files, you are agreeing to the terms and conditions specified in the LICENSE.txt file available in the same directory.
int overlapH = 1; int overlapV = 1; xF::xfcvDataMovers<xF::TILER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR> tiler(overlapH, overlapV); xF::xfcvDataMovers<xF::STITCHER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR> stitcher;
Choice of MAX_TILE_HEIGHT / MAX_TILE_WIDTH provide constraints on image tile size which in turn governs local memory usage. The image tile size in bytes can be computed as below
Image tile size = (TILE_HEADER_SIZE_IN_BYTES + MAX_TILE_HEIGHT*MAX_TILE_WIDTH*sizeof(DATA_TYPE))
Here TILE_HEADER_SIZE_IN_BYTES is 128 bytes for current version of Tiler / Stitcher. DATA_TYPE in above example is int16_t (2 bytes).o
Note
Current version of HW data movers have 8_16 configuration (i.e. 8 bit image element data type on host side and 16 bit image element data type on AIE kernel side). In future more such configurations will be provided (example: 8_8 / 16_16 etc.)
Tiler / Stitcher IPs use PL resources available on VCK boards. For 8_16 configuration below table illustrates resource utilization numbers for theese IPs. The numbers correspond to single instance of each IP.
LUTs FFs BRAMs DSPs Fmax Tiler 2761 3832 5 13 400 MHz Stitcher 2934 3988 5 7 400 MHz Total 5695 7820 10 20 GMIO data movers
Transition to GMIO based data movers can be achieved by using a specialized template implementation of above class. All above constraints w.r.t Image tile size calculation are valid here as well. Sample code is shown below
xF::xfcvDataMovers<xF::TILER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR, 1, 0, true> tiler(1, 1); xF::xfcvDataMovers<xF::STITCHER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR, 1, 0, true> stitcher;
Note
Last template parameter is set to true, implying GMIO specialization.
Once the objects are constructed, simple API calls can be made to initiate the data transfers. Sample code is shown below
//For PLIO
auto tiles_sz = tiler.host2aie_nb(src_hndl, srcImageR.size());
stitcher.aie2host_nb(dst_hndl, dst.size(), tiles_sz);
//For GMIO
auto tiles_sz = tiler.host2aie_nb(srcData.data(), srcImageR.size(), {"gmioIn[0]"});
stitcher.aie2host_nb(dstData.data(), dst.size(), tiles_sz, {"gmioOut[0]"});
Note
GMIO data transfers take additional argument which is corresponding GMIO port to be used.
Note
For GMIO based transfers there is a blocking method as well (host2aie(…) / aie2host(…)). For PLIO based data transfers the method only non-blocking API calls are provided.
Using ‘tile_sz’ user can run the graph appropriate number of times.
filter_graph.run(tiles_sz[0] * tiles_sz[1]);
After the runs are started, user needs to wait for all transactions to get complete.
filter_graph.wait();
tiler.wait();
stitcher.wait();
Note
Current implementation of xfcvDataMovers support only 1 core. Multi core support is planned for future releases.