Overview - 2023.1 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2023-08-02
Version
2023.1 English

We looked at a straightforward bilateral resize algorithm in the last example. While we saw that it was not an amazing candidate for acceleration, perhaps you might want to simultaneously convert a buffer to a number of different resolutions (say for different machine learning algorithms). Or you might just want to offload it to save CPU availability for other processing during a frame window.

But, xplore the real beauty of FPGAs: streaming. Remember that going back and forth to memory is expensive, so instead of doing that, just send each pixel of the image along to another image processing pipeline stage without having to go back to memory by simply streaming from one operation to the next.

In this case, we want to amend our earlier sequence of events to add in a Gaussian Filter. This is a very common pipeline stage to remove noise in an image before an operation such as edge detection, corner detection, etc. We might even intend to add in some 2D filtering afterwards, or some other algorithm.

So, modifying our workflow from before, we now have:

  1. Read the pixels of the image from memory.

  2. If necessary, convert them to the proper format. In our case we will be looking at the default format used by the OpenCV library, BGR. But in a real system where you would be receiving data from various streams cameras, etc. you would have to deal with formatting, either in software or in the accelerator (where it is basically a “free” operation, as we will see in the next example).

  3. For color images, extract each channel.

  4. Use a bilateral resizing algorithm on each independent channel.

  5. Perform a Gaussian blur on each channel.

  6. Recombine the channels and store back in memory.

So, we now have two “big” algorithms: bilateral resize and Gaussian blur. For a resized image of wout × hout, and a square gaussian window of width k, our computation time for the entire pipeline would be roughly:

Image Processing Time

For fun, make k relatively large without going overboard; we will choose a 7 × 7 window.