Stencil Optimizations - 2024.1 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2024-05-30
Version
2024.1 English

The stencil pattern algorithm is a computational pattern used extensively in the field of scientific computing, image processing, and numerical simulations.

  • This algorithm updates the value of each element in an array by applying a predetermined pattern or "stencil," which dictates how to amalgamate the values of adjacent elements.
  • For instance, within image processing contexts, a stencil computation could calculate a weighted average of a pixel's value with its immediate neighbors to achieve a blurring effect. Stencil computations are demanding in terms of memory bandwidth on hardware like FPGAs, as the non-sequential nature of stencil pixel locations necessitates numerous DDR memory reads, markedly prolonging computation times.
  • To mitigate this, stencil pattern algorithms will be implemented using window and line buffer techniques. These techniques optimize data access patterns, reducing the need for multiple off-chip memory reads by caching relevant data in an on-chip memory.

These techniques can improve memory bandwidth utilization and increase the effectiveness of parallelization and pipelining.

Adapting algorithms to leverage line and window buffers can be time-intensive, requiring considerable code refactoring. Vitis HLS introduces stencil optimization/pragma, which can automatically implement the line buffer and window buffer and achieve the same performance.

for(int y=0; y<30; ++y)
    {
        for(int x=0; x<1000; ++x)
        {
#pragma HLS pipeline II=1
#pragma HLS array_stencil variable=src
 
            // Apply 2D filter to the pixel window
            int sum = 0;
            for(int row=0; row<FILTER_V_SIZE; row++)
            {
 
                for(int col=0; col<FILTER_H_SIZE; col++)
                {
                    unsigned char pixel;
                    int xoffset = (x+col-(FILTER_H_SIZE/2));
                    int yoffset = (y+row-(FILTER_V_SIZE/2));
                    // Deal with boundary conditions : clamp pixels to 0 when outside of image
                    if ( (xoffset<0) || (xoffset>=1000) || (yoffset<0) || (yoffset>=30) ) {
                        pixel = 0;
                    } else {
                        pixel = src[yoffset][xoffset];
                    }
                    sum += pixel*coeffs[row][col];
                }
            }