Using HLS Streams for Streaming Data - 2021.1 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
Release Date
2021.1 English

One of the first enhancements which can be made to the earlier code is to use the HLS stream construct, typically referred to as an hls::stream. An hls::stream object can be used to store data samples in the same manner as an array. The data in an hls::stream can only be accessed sequentially. In the C/C++ code, the hls::stream behaves like a FIFO of infinite depth.

Code written using hls::stream will generally create designs in an FPGA which have high-performance and use few resources because an hls::stream enforces a coding style which is ideal for implementation in an FPGA.

Multiple reads of the same data from an hls::stream are impossible. Once the data has been read from an hls::stream it no longer exists in the stream. This helps remove this coding practice.

If the data from an hls::stream is required again, it must be cached. This is another good practice when writing code to be synthesized on an FPGA.

The hls::stream forces the C/C++ code to be developed in a manner which ideal for an FPGA implementation.

When an hls::stream is synthesized it is automatically implemented as a FIFO channel which is 1 element deep. This is the ideal hardware for connecting pipelined tasks.

There is no requirement to use hls::stream and the same implementation can be performed using arrays in the C/C++ code. The hls::stream construct does help enforce good coding practices.

With an hls::stream construct the outline of the new optimized code is as follows:

template<typename T, int K>
static void convolution_strm(
int width, 
int height,
hls::stream<T> &src, 
hls::stream<T> &dst,
const T *hcoeff, 
const T *vcoeff)

hls::stream<T> hconv("hconv");
hls::stream<T> vconv("vconv");
// These assertions let HLS know the upper bounds of loops
assert(height < MAX_IMG_ROWS);
assert(width < MAX_IMG_COLS);
assert(vconv_xlim < MAX_IMG_COLS - (K - 1));

// Horizontal convolution 
HConvH:for(int col = 0; col < height; col++) {
 HConvW:for(int row = 0; row < width; row++) {
   HConv:for(int i = 0; i < K; i++) {
// Vertical convolution 
VConvH:for(int col = 0; col < height; col++) {
 VConvW:for(int row = 0; row < vconv_xlim; row++) {
   VConv:for(int i = 0; i < K; i++) {

Border:for (int i = 0; i < height; i++) {
 for (int j = 0; j < width; j++) {

Some noticeable differences compared to the earlier code are:

  • The input and output data is now modeled as hls::stream.
  • Instead of a single local array of size HEIGHT*WDITH there are two internal hls::stream used to save the output of the horizontal and vertical convolutions.

In addition, some assert statements are used to specify the maximize of loop bounds. This is a good coding style which allows HLS to automatically report on the latencies of variable bounded loops and optimize the loop bounds.