This section will discuss the baseline software implementation and performance measurements, which will be used to gauge the acceleration requirements given the performance constraints.
The convolution filter is implemented in software using a typical multi-level nested loop structure. Outer two loops define the pixel to be processed(iterating over each pixel). The inner two loops perform the sum-of-product (SOP) operation, actual convolution filtering between the coefficient matrix and the selected sub-matrix from the image centered around the processed pixel.
TIP: Boundary conditions where it is not possible to center sub-matrix around a given pixel require special processing. This algorithm assumes all pixels beyond the boundary of the image have zero values.
void Filter2D(
const char coeffs[FILTER_V_SIZE][FILTER_H_SIZE],
float factor,
short bias,
unsigned short width,
unsigned short height,
unsigned short stride,
const unsigned char *src,
unsigned char *dst)
{
for(int y=0; y<height; ++y)
{
for(int x=0; x<width; ++x)
{
// Apply 2D filter to the pixel window
int sum = 0;
for(int row=0; row<FILTER_V_SIZE; row++)
{
for(int col=0; col<FILTER_H_SIZE; col++)
{
unsigned char pixel;
int xoffset = (x+col-(FILTER_H_SIZE/2));
int yoffset = (y+row-(FILTER_V_SIZE/2));
// Deal with boundary conditions : clamp pixels to 0 when outside of image
if ( (xoffset<0) || (xoffset>=width) || (yoffset<0) || (yoffset>=height) ) {
pixel = 0;
} else {
pixel = src[yoffset*stride+xoffset];
}
sum += pixel*coeffs[row][col];
}
}
// Normalize and saturate result
unsigned char outpix = MIN(MAX((int(factor * sum)+bias), 0), 255);
// Write output
dst[y*stride+x] = outpix;
}
}
}
The following snapshot shows how the top-level function calls the convolution filter function for an image with three components or channels. Here OpenMP pragma is used to parallelize software execution using multiple threads. You can open src/host_randomized.cpp and src/filter2d_sw.cpp from tutorial directory to examine all implementation details.
#pragma omp parallel for num_threads(3)
for(int n=0; n<numRunsSW; n++)
{
// Compute reference results
Filter2D(filterCoeffs[filterType], factor, bias, width, height, stride, y_src, y_ref);
Filter2D(filterCoeffs[filterType], factor, bias, width, height, stride, u_src, u_ref);
Filter2D(filterCoeffs[filterType], factor, bias, width, height, stride, v_src, v_ref);
}