One of the first enhancements which can be made to the earlier code is to use the
HLS stream construct, typically referred to as an
hls::stream
. An hls::stream
object can be used to store data
samples in the same manner as an array. The data in an hls::stream
can only be accessed
sequentially. In the C/C++ code, the hls::stream
behaves like a FIFO of infinite depth.
Code written using hls::stream
will generally create designs in
an FPGA which have high-performance and use few resources because an
hls::stream
enforces a coding style which is ideal for implementation in
an FPGA.
Multiple reads of the same data from an hls::stream
are
impossible. Once the data has been read from an hls::stream
it no longer
exists in the stream. This helps remove this coding practice.
If the data from an hls::stream
is required again, it must be
cached. This is another good practice when writing code to be synthesized on an FPGA.
The hls::stream
forces the C/C++ code to be
developed in a manner which ideal for an FPGA implementation.
When an hls::stream
is synthesized it is automatically
implemented as a FIFO channel which is 1 element deep. This is the ideal hardware for
connecting pipelined tasks.
There is no requirement to use hls::stream
and
the same implementation can be performed using arrays in the C/C++
code. The hls::stream
construct
does help enforce good coding practices.
With an hls::stream
construct the outline of the new optimized
code is as follows:
template<typename T, int K>
static void convolution_strm(
int width,
int height,
hls::stream<T> &src,
hls::stream<T> &dst,
const T *hcoeff,
const T *vcoeff)
{
hls::stream<T> hconv("hconv");
hls::stream<T> vconv("vconv");
// These assertions let HLS know the upper bounds of loops
assert(height < MAX_IMG_ROWS);
assert(width < MAX_IMG_COLS);
assert(vconv_xlim < MAX_IMG_COLS - (K - 1));
// Horizontal convolution
HConvH:for(int col = 0; col < height; col++) {
HConvW:for(int row = 0; row < width; row++) {
HConv:for(int i = 0; i < K; i++) {
}
}
}
// Vertical convolution
VConvH:for(int col = 0; col < height; col++) {
VConvW:for(int row = 0; row < vconv_xlim; row++) {
VConv:for(int i = 0; i < K; i++) {
}
}
Border:for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
}
}
Some noticeable differences compared to the earlier code are:
- The input and output data is now modeled as
hls::stream
. - Instead of a single local array of size HEIGHT*WDITH there are two internal
hls::stream
used to save the output of the horizontal and vertical convolutions.
In addition, some assert
statements are used to specify the
maximize of loop bounds. This is a good coding style which allows HLS to automatically
report on the latencies of variable bounded loops and optimize the loop bounds.