Top Level Structure of Kernel - 2023.1 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
Release Date
2023.1 English

The top-level of the convolution filter is modeled using a dataflow process. The dataflow consists of four different functions as follows. For full implementation details, refer to the source file src/filter2d_hw.cpp in the convolutional tutorial directory.

void Filter2DKernel(
        const char           coeffs[256],
        float                factor,
        short                bias,
        unsigned short       width,
        unsigned short       height,
        unsigned short       stride,
        const unsigned char  src[MAX_IMAGE_WIDTH*MAX_IMAGE_HEIGHT],
        unsigned char        dst[MAX_IMAGE_WIDTH*MAX_IMAGE_HEIGHT])
    #pragma HLS DATAFLOW

  // Stream of pixels from kernel input to filter, and from filter to output
    hls::stream<char,2>    coefs_stream;
    hls::stream<U8,2>      pixel_stream;
    hls::stream<window,3>  window_stream; // Set FIFO depth to 0 to minimize resources
    hls::stream<U8,64>     output_stream;

  // Read image data from global memory over AXI4 MM, and stream pixels out
    ReadFromMem(width, height, stride, coeffs, coefs_stream, src, pixel_stream);

    // Read incoming pixels and form valid HxV windows
    Window2D(width, height, pixel_stream, window_stream);

  // Process incoming stream of pixels, and stream pixels out
  Filter2D(width, height, factor, bias, coefs_stream, window_stream, output_stream);

  // Write an incoming stream of pixels and write them to global memory over AXI4 MM
  WriteToMem(width, height, stride, output_stream, dst);

The dataflow chain consists of four different functions as follows:

  • ReadFromMem: Reads pixel data or video input from main memory

  • Window2D: Local cache with wide (15x15 pixels) access on the output side

  • Filter2D: Core kennel filtering algorithm

  • WriteToMem: Writes output data to the main memory

Two functions at the input and output read and write data from the device’s global memory. The ReadFromMem function reads data and streams it for filtering. The WriteToMem function at the end of the chain writes processed pixel data to the device memory. The input data (pixels) read from the main memory is passed to the Window2D function, which creates a local cache and, on every cycle, provides a 15x15 pixel sample to the filter function/block. The Filter2D function can consume the 15x15 pixel sample in a single cycle to perform 225(15x15) MACs per cycle.

Open the src/filter2d_hw.cpp source file from the convolutioanl tutorial directory, and examine the implementation details of these individual functions. In the next section, you will elaborate on the implementation details of Window2D and Filter2D functions. The following figure shows how data flows between different functions (dataflow modules).