The top-level of the convolution filter is modeled using a dataflow process. The dataflow consists of four different functions as follows. For full implementation details, refer to the source file src/filter2d_hw.cpp
in the convolutional tutorial directory.
void Filter2DKernel(
const char coeffs[256],
float factor,
short bias,
unsigned short width,
unsigned short height,
unsigned short stride,
const unsigned char src[MAX_IMAGE_WIDTH*MAX_IMAGE_HEIGHT],
unsigned char dst[MAX_IMAGE_WIDTH*MAX_IMAGE_HEIGHT])
{
#pragma HLS DATAFLOW
// Stream of pixels from kernel input to filter, and from filter to output
hls::stream<char,2> coefs_stream;
hls::stream<U8,2> pixel_stream;
hls::stream<window,3> window_stream; // Set FIFO depth to 0 to minimize resources
hls::stream<U8,64> output_stream;
// Read image data from global memory over AXI4 MM, and stream pixels out
ReadFromMem(width, height, stride, coeffs, coefs_stream, src, pixel_stream);
// Read incoming pixels and form valid HxV windows
Window2D(width, height, pixel_stream, window_stream);
// Process incoming stream of pixels, and stream pixels out
Filter2D(width, height, factor, bias, coefs_stream, window_stream, output_stream);
// Write an incoming stream of pixels and write them to global memory over AXI4 MM
WriteToMem(width, height, stride, output_stream, dst);
}
The dataflow chain consists of four different functions as follows:
ReadFromMem: Reads pixel data or video input from main memory
Window2D: Local cache with wide (15x15 pixels) access on the output side
Filter2D: Core kennel filtering algorithm
WriteToMem: Writes output data to the main memory
Two functions at the input and output read and write data from the device’s global memory. The ReadFromMem
function reads data and streams it for filtering. The WriteToMem
function at the end of the chain writes processed pixel data to the device memory. The input data (pixels) read from the main memory is passed to the Window2D
function, which creates a local cache and, on every cycle, provides a 15x15 pixel sample to the filter function/block. The Filter2D
function can consume the 15x15 pixel sample in a single cycle to perform 225(15x15) MACs per cycle.
Open the src/filter2d_hw.cpp
source file from the convolutioanl tutorial directory, and examine the implementation details of these individual functions. In the next section, you will elaborate on the implementation details of Window2D and Filter2D functions. The following figure shows how data flows between different functions (dataflow modules).