The hls::stream_of_blocks
type provides a
user-synchronized stream that supports streaming blocks of data for process-level
interfaces in a dataflow context, where each block is an array or multidimensional
array. The intended use of stream-of blocks is to replace array-based communication
between a pair of processes within a dataflow region.
Currently, Vitis HLS implements arrays
written by a producer process and read by a consumer process in a dataflow region by
mapping them to ping pong buffers (PIPOs). The buffer exchange for a PIPO buffer is
driven by the ap_done
/ap_continue
handshake of the producer process, and by the ap_start
/ap_ready
handshake of the consumer process. In other words, the exchange occurs at the return of
the producer function and the calling of the consumer function in C++.
While this ensures a concurrent communication semantic that is fully compliant with the sequential C++ execution semantics, it also implies that the consumer cannot start until the producer is done, as shown in the following code example.
void producer (int b[M][N], ...) {
for (int i = 0; i < M; i++)
for (int j = 0; j < N; j++)
b[i][f(j)] = ...;
}
void consumer(int b[M][N], ...) {
for (int i = 0; i < M; i++)
for (int j = 0; j < N; j++)
... = b[i][g(j)] ...;;
}
void top(...) {
#pragma HLS dataflow
int b[M][N];
#pragma HLS stream off variable=b
producer(b, ...);
consumer(b, ...);
}
This can unnecessarily limit throughput if the producer generates data for
the consumer in smaller blocks, for example by writing one row of the buffer output
inside a nested loop, and the consumer uses the data in smaller blocks by reading one
row of the buffer input inside a nested loop, as the example above does. In this
example, due to the non-sequential buffer column access in the inner loop you cannot
simply stream the array b
. However, the row access in
the outer loop is sequential thus supporting hls::stream_of_blocks
communication where each block is a 1-dimensional array of size N.
The main purpose of the hls::stream_of_blocks
feature is
to provide PIPO-like functionality, but with user-managed explicit synchronization,
accesses, and better coding style. Stream-of-blocks lets you avoid the use of dataflow
in a loop containing the producer and consumer, which would have been a way to optimize
the example above. However, in this case, the use of the dataflow loop containing the
producer and consumer requires the use of a very large PIPO buffer
(2xMxN
) as shown in the following example:
void producer (int b[N], ...) {
for (int j = 0; j < N; j++)
b[f(j)] = ...;
}
void consumer(int b[N], ...) {
for (int j = 0; j < N; j++)
... = b[g(j)];
}
void top(...) {
// The loop below is very constrained in terms of how it must be written
for (int i = 0; i < M; i++) {
#pragma HLS dataflow
int b[N];
#pragma HLS stream off variable=b
producer(b, ...); // writes b
consumer(b, ...); // reads b
}
}
The dataflow-in-a-loop code above is also not desirable because this structure has several limitations in Vitis HLS, such as the loop structure must be very constrained (singe induction variable, starting from 0 and compared with a constant or a function argument and incremented by 1).