The hls::stream_of_blocks
type provides a
user-synchronized stream that supports streaming blocks of data for process-level
interfaces in a dataflow context, where each block is an array or multidimensional
array. The intended use of stream-of-blocks is to replace array-based communication
between a pair of processes within a dataflow region. Refer to the using_stream_of_blocks example on Github.
Currently, Vitis HLS implements arrays written by a producer process and read by a consumer process in a dataflow region by mapping them to ping pong buffers (PIPOs). The buffer exchange for a PIPO buffer occurs at the return of the producer function and the calling of the consumer function in C++.
Stream-of-Blocks Modeling Style
On the other hand, for a stream-of-blocks the communication between the producer and the consumer is modeled as a stream of array-like objects, providing several advantages over array transfer through PIPO.
The use of stream-of-blocks in your code requires the following include file:
#include "hls_streamofblocks.h"
The stream-of-blocks object template is:
hls::stream_of_blocks<block_type, depth> v
Where:
-
<block_type>
specifies the datatype of the array or multidimensional array held by the stream-of-blocks -
<depth>
is an optional argument that provides depth control just likehls::stream
or PIPOs, and specifies the total number of blocks, including the one acquired by the producer and the one acquired by the consumer at any given time. The default value is 2 -
v
specifies the variable name for the stream-of-blocks object
Use the following steps to access a block in a stream of blocks:
- The producer or consumer process that wants to access the stream
first needs to acquire access to it, using a
hls::write_lock
orhls::read_lock
object. -
After the producer has acquired the lock it can start writing (or reading) the acquired block. Once the block has been fully initialized, it can be released by the producer, when the
write_lock
object goes out of scope.Note: The producer process with a
write_lock
can also read the block as long as it only reads from already written locations, because the newly acquired buffer must be assumed to contain uninitialized data. The ability to write and read the block is unique to the producer process, and is not supported for the consumer. - Then the block is queued in the stream-of-blocks in a FIFO
fashion, and when the consumer acquires a
read_lock
object, the block can be read by the consumer process.
The main difference between hls::stream_of_blocks
and the PIPO mechanism seen in the prior
examples is that the block becomes available to the consumer as soon as the write_lock
goes out of scope, rather than only at the
return of the producer process. Therefore the amount of storage is much less with
stream-of-blocks than with just PIPOs: namely 2N instead of 2xMxN.
The producer acquires the block by constructing an hls::write_lock
object called b
, and
passing it the reference to the stream-of-blocks object, called s
. The write_lock
object provides an overloaded array access operator, letting it be accessed as an
array to access underlying storage in random order as shown in the example
below.
The acquisition of the lock is performed by constructing the write_lock
/read_lock
object, and the release occurs automatically when that object is destructed as it
goes out of scope. This approach uses the common Resource
Acquisition Is Initialization (RAII) style of locking and unlocking.
#include "hls_streamofblocks.h"
typedef int buf[N];
void producer (hls::stream_of_blocks<buf> &s, ...) {
for (int i = 0; i < M; i++) {
// Allocation of hls::write_lock acquires the block for the producer
hls::write_lock<buf> b(s);
for (int j = 0; j < N; j++)
b[f(j)] = ...;
// Deallocation of hls::write_lock releases the block for the consumer
}
}
void consumer(hls::stream_of_blocks<buf> &s, ...) {
for (int i = 0; i < M; i++) {
// Allocation of hls::read_lock acquires the block for the consumer
hls::read_lock<buf> b(s);
for (int j = 0; j < N; j++)
... = b[g(j)] ...;
// Deallocation of hls::write_lock releases the block to be reused by the producer
}
}
void top(...) {
#pragma HLS dataflow
hls::stream_of_blocks<buf> s;
producer(b, ...);
consumer(b, ...);
}
The key features of this approach include:
- The expected performance of the outer loop in the producer above is to achieve an overall Initiation Interval (II) of 1
- A locked block can be used as though it were private to the producer or the consumer process until it is released.
- The initial state of the array object for the producer is undefined, whereas it contains the values written by the producer for the consumer.
- The principal advantage of stream-of-blocks is to provide overlapped execution of multiple iterations of the consumer and the producer to increase throughput.
Resource Usage
The resource cost when increasing the depth beyond the default value of 2 is similar to the resource cost of PIPOs. Namely, each increment of 1 will require enough memory for a block, e.g., in the example above N * 32-bit words.
The stream of blocks object can be bound to a specific RAM type, by
placing the BIND_STORAGE
pragma where the
stream-of-blocks is declared, for example in the top-level function. The stream of
blocks uses 2-port BRAM (type=RAM_2P
) by
default.