hls::stream_of_blocks
objects in your code include the
header file hls_streamofblocks.h.The hls::stream_of_blocks
type provides a
user-synchronized stream that supports streaming blocks of data for process-level
interfaces in a dataflow context, where each block is an array or multidimensional
array. The intended use of stream-of-blocks is to replace array-based communication
between a pair of processes within a dataflow region. Refer to the using_stream_of_blocks example on Github.
Currently, AMD Vitis™ HLS implements arrays written by a producer process and read by a consumer process in a dataflow region by mapping them to ping pong buffers (PIPOs). The buffer exchange for a PIPO buffer occurs at the return of the producer function and the calling of the consumer function in C++.
While this ensures a concurrent communication semantic that is fully compliant with the sequential C++ execution semantics, it also implies that the consumer cannot start until the producer is done, as shown in the following code example.
void producer (int b[M][N], ...) {
for (int i = 0; i < M; i++)
for (int j = 0; j < N; j++)
b[i][f(j)] = ...;
}
void consumer(int b[M][N], ...) {
for (int i = 0; i < M; i++)
for (int j = 0; j < N; j++)
... = b[i][g(j)] ...;;
}
void top(...) {
#pragma HLS dataflow
int b[M][N];
#pragma HLS stream off variable=b
producer(b, ...);
consumer(b, ...);
}
This can unnecessarily limit throughput and/or increase resources if the
producer generates data for the consumer in smaller blocks, for example by writing one
row of the buffer output inside a nested loop, and the consumer uses the data in smaller
blocks by reading one row of the buffer input inside a nested loop, as the example above
does. In this example, due to the non-sequential buffer column access in the inner loop,
you cannot simply stream the array b
. However, the row
access in the outer loop is sequential thus supporting hls::stream_of_blocks
communication where each block is a 1-dimensional
array of size N.
The main purpose of the hls::stream_of_blocks
feature is to provide PIPO-like functionality, but
with user-managed explicit synchronization, accesses, and a better coding style.
Stream-of-blocks lets you avoid the use of dataflow in a loop containing the producer
and consumer, which would have been a way to optimize the example above. However, in
this case, the use of the dataflow loop containing the producer and consumer requires
the use of a PIPO buffer (2xN
) as shown in the
following example:
void producer (int b[N], ...) {
for (int j = 0; j < N; j++)
b[f(j)] = ...;
}
void consumer(int b[N], ...) {
for (int j = 0; j < N; j++)
... = b[g(j)];
}
void top(...) {
// The loop below is very constrained in terms of how it must be written
for (int i = 0; i < M; i++) {
#pragma HLS dataflow
int b[N];
#pragma HLS stream off variable=b
producer(b, ...); // writes b
consumer(b, ...); // reads b
}
}
The dataflow-in-a-loop code above is also not desirable because this structure has several limitations in Vitis HLS, such as the loop structure must be very constrained (single induction variable, starting from 0 and compared with a constant or a function argument and incremented by 1).