xfcvDataMovers - 2024.1 English

Vitis Libraries

Release Date
2024.1 English

The xfcvDataMovers class provides a high level API abstraction to initiate data transfer from the DDR to the AIE core and vice versa for hw-emulation / hw runs. Because each AIE core has limited local memory which is not sufficient to fit an entire high resolution image (input / output), each image needs to be partitioned into smaller tiles and then send to AIE core for computation. After computation the tiled image at output is stitched back to generate the high resolution image at the output. This process involves complex computation as tiling needs to ensure proper border handling and overlap processing in case of convolution based kernels.

The xfcvDataMovers class object takes some simple, user provided parameters and provides a simple data transaction API where you do not have to consider the complexity. Moreover it provides a template parameter, using which, the application can switch from PL-based data movement to GMIO-based (and vice versa) seamlessly.

Table 268 Table. xfcvDataMovers Template Parameters
Parameter Description
KIND Type of object TILER / STITCHER
DATA_TYPE Data type of AIE core kernel input or output
TILE_HEIGHT_MAX Maximum tile height
TILE_WIDTH_MAX Maximum tile width
AIE_VECTORIZATION_FACTOR AIE core vectorization factor
CORES Number of AIE cores to be used
PL_AXI_BITWIDTH For PL based data movers. It is the data width for AXI transfers between DDR - PL
USE_GMIO Set to true to use GMIO based data transfer
Table 269 Table. xfcvDataMovers constructor parameters
Parameter Description
overlapH Horizontal overlap of the AIE core / pipeline
overlapV Vertical overlap of the AIE core / pipeline


Horizontal overlap and Vertical overlaps should be computed for the complete pipeline. For example if the pipeline has a single 3x3 2D filter then overlap sizes (both horizontal and vertical) will be 1. However in the case of two such filter operations which are back to back, the overlap size will be 2. Currently it is expected that users provide this input correctly.

The data transfer using the xfcvDataMovers class can be done in one of two ways:

  1. PLIO data movers

    This is the default mode for xfcvDataMovers class operation. When this method is used, data is transferred using hardware Tiler / Stitcher IPs provided by AMD. The Makefile provided with design examples shipped with the library provide the locations of the .xo files for these IPs. It also shows how to incorporate them in the Vitis Build System. You need to create an object of xfcvDataMovers class per input / output image as shown in following code.


    The implementations of Tiler and Stitcher for PLIO are provided as .xo files in the ‘L1/lib/hw’ folder. By using these files, you are agreeing to the terms and conditions specified in the LICENSE.txt file available in the same directory.

    int overlapH = 1;
    int overlapV = 1;
    xF::xfcvDataMovers<xF::TILER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR> tiler(overlapH, overlapV);

    The choice of MAX_TILE_HEIGHT / MAX_TILE_WIDTH provides constraints on the image tile size which in turn governs local memory usage. The image tile size in bytes can be computed as follows.


    Here TILE_HEADER_SIZE_IN_BYTES is 128 bytes for the current version of Tiler / Stitcher. DATA_TYPE in above example is int16_t (2 bytes).o


    The current version of HW data movers have 8_16 configuration (i.e., an 8-bit image element data type on the host side and a 16-bit image element data type on the AIE kernel side). In future more such configurations will be provided (example: 8_8 / 16_16 etc.).

    Tiler / Stitcher IPs use PL resources available on VCK boards. For 8_16 configuration, the following table illustrates resource utilization numbers for these IPs. The numbers correspond to a single instance of each IP.

    Table 270 Table: Tiler / Stitcher Resource Utilization (8_16 config)
      LUTs FFs BRAMs DSPs Fmax
    Tiler 2761 3832 5 13 400 MHz
    Stitcher 2934 3988 5 7 400 MHz
    Total 5695 7820 10 20  
  2. GMIO data movers

    Transition to GMIO-based data movers can be achieved by using a specialized template implementation of the above class. All above constraints with regard to the image tile size calculation are valid here as well. Sample code is shown below.

    xF::xfcvDataMovers<xF::TILER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR, 1, 0, true> tiler(1, 1);
    xF::xfcvDataMovers<xF::STITCHER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR, 1, 0, true> stitcher;


    The last template parameter is set to true, implying GMIO specialization.

Once the objects are constructed, simple API calls can be made to initiate the data transfers. Sample code is shown below.

//For PLIO
auto tiles_sz = tiler.host2aie_nb(&src_hndl, srcImageR.size());
stitcher.aie2host_nb(&dst_hndl, dst.size(), tiles_sz);

//For GMIO
auto tiles_sz = tiler.host2aie_nb(srcData.data(), srcImageR.size(), {"gmioIn[0]"});
stitcher.aie2host_nb(dstData.data(), dst.size(), tiles_sz, {"gmioOut[0]"});


GMIO data transfers take an additional argument which is the corresponding GMIO port to be used.


For GMIO-based transfers, there is a blocking method as well (host2aie(…) / aie2host(…)). For PLIO-based data transfers only non-blocking API calls are provided.

Using tile_sz, you can run the graph the appropriate number of times.

filter_graph_hndl.run(tiles_sz[0] * tiles_sz[1]);

After the runs are started, you need to wait for all transactions to complete.