Data-driven task-level parallelism uses a task-channel modeling style that requires you to statically instantiate and connect tasks and channels explicitly. Tasks in this modeling style only have stream type inputs and outputs. The tasks are not controlled by any function call/return semantics but rather are always running waiting for data on their input stream.
Data-driven TLP models are tasks that execute when there is data to be processed. In Vitis HLS C-simulation used to be limited to seeing only the sequential semantics and behavior. With the data-driven model it is possible during simulation to see the concurrent nature of parallel tasks and their interactions via the FIFO channels.
Implementing data-driven TLP in the Vitis HLS tool uses simple classes for modeling tasks (
hls::task
) and
channels (
hls::stream/hls::stream_of_blocks
)
hls::tasks
for a
top-level function, you cannot use hls::stream_of_blocks
for interfaces in top-level functions.Consider the simple task-channel example shown below:
#include "test.h"
void splitter(hls::stream<int> &in, hls::stream<int> &odds_buf, hls::stream<int> &evens_buf) {
int data = in.read();
if (data % 2 == 0)
evens_buf.write(data);
else
odds_buf.write(data);
}
void odds(hls::stream<int> &in, hls::stream<int> &out) {
out.write(in.read() + 1);
}
void evens(hls::stream<int> &in, hls::stream<int> &out) {
out.write(in.read() + 2);
}
void odds_and_evens(hls::stream<int> &in, hls::stream<int> &out1, hls::stream<int> &out2) {
hls_thread_local hls::stream<int> s1; // channel connecting t1 and t2
hls_thread_local hls::stream<int> s2; // channel connecting t1 and t3
// t1 infinitely runs function splitter, with input in and outputs s1 and s2
hls_thread_local hls::task t1(splitter, in, s1, s2);
// t2 infinitely runs function odds, with input s1 and output out1
hls_thread_local hls::task t2(odds, s1, out1);
// t3 infinitely runs function evens, with input s2 and output out2
hls_thread_local hls::task t3(evens, s2, out2);
}
The special hls::task
C++ class is:
-
A new object declaration in your source code that requires a special qualifier. The
hls_thread_local
qualifier is required in order to keep the object (and the underlying thread) alive across multiple calls of the instantiating function (odds_and_evens
in the example).The
hls_thread_local
qualifier is only required to ensure that the C simulation of the data-driven TLP model exhibits the same behavior as the RTL simulation. In the RTL, these functions are already in always running mode once started. In order to ensure the same behavior during C Simulation, thehls_thread_local
qualifier is required to ensure that each task is started only once and keeps the same state even when called multiple times. Without thehls_thread_local
qualifier, each new invocation of the function would result in a new state. - Task objects implicitly manage a thread that runs a function
infinitely, passing to it a set of arguments that must be either
hls::stream
orhls::stream_of_blocks
Tip: No other types of arguments are supported. In particular, even constant values cannot be passed as function arguments. If constants need to be passed to the task's body, define the function as a templated function and pass the constant as a template argument to this templated function. - The supplied function (
splitter/odds/evens
in the example above) is called the task body, and it has an implicit infinite loop wrapped around it to ensure that the task keeps running and waiting on input. - The supplied function can contain pipelined loops but they need to be flushable pipelines (FLP) in order to prevent deadlock. The tool will automatically select the right pipeline style to use for a given pipelined loop or function.
hls:task
should not be treated as a function call - instead a hls::task
needs to be thought of as a persistent instance
statically bound to channels. Due to this, it will be your responsibility to ensure that
multiple invocations to any function that contains hls::tasks
be uniquified or these calls will use the same hls::tasks
and channels. Channels are modeled by the special templatized hls::stream
(or hls::stream_of_blocks
)
C++ class. Such channels have the following attributes:
- In the data-driven TLP model, an
hls::stream<type,depth>
object behaves like a FIFO with a specified depth. Such streams have a default depth of 2 which can be overridden by the user. - The streams are read from and written to sequentially. That implies that once
a data item is read from an
hls::stream<>
that same data item cannot be read again.Tip: Accesses to different streams are not ordered (e.g. the order of a write to a stream and a read from a different stream can be changed by the scheduler). - Streams may be defined either locally or globally. Streams defined in the global scope follow the same rules as any other global variables.
- The
hls_thread_local
qualifier is also required for streams (s1
ands2
in the example below) in order to keep the same streams alive across multiple calls of the instantiating function (odds_and_evens
in the code example below).
The following diagram shows the graphical representation in Vitis HLS of the code example above. In this diagram, the
green colored arrows are FIFO channels while the blue arrows indicate the inputs and
outputs of the instantiating function (odds_and_evens
).
Tasks are shown as blue rectangular boxes.
hls::task
Example
Due to the fact that a read of an empty stream is a blocking read, deadlocks can occur due to:
- The design itself, where the production and consumption rates by
processes are unbalanced.
- During C simulation, deadlocks can occur only due to a cycle of processes, or a chain of processes starting from a top-level input, that are attempting to read from empty channels.
- Deadlocks can occur during both C/RTL Co-simulation and when running in hardware (HW) due to cycles of processes trying to write to full channels and/or reading from empty channels.
- The test bench, which is providing less data than those that are needed to produce all the outputs that the test bench is expecting when checking the computation results.
Due to this, a deadlock detector is automatically instantiated when the
design contains an hls::task
. The deadlock detector
detects deadlocks and stops the C simulation. Further debugging is performed using a C
debugger such as gdb
and looking at where the simulated
hls::tasks
are all blocked trying to read from an
empty channel. Note that this is easy to do using the Vitis HLS GUI as shown in the handling_deadlock example for debugging
deadlocks.
In summary, the hls::task
model is recommended
if your design requires a completely data-driven, pure streaming type of behavior, with
no sort of control. This type of model is also useful in modeling feedback and dynamic
multi-rate designs. Feedback in the design occurs when there is a cyclical dependency
between tasks. Dynamic multi-rate models, where the producer writes data or consumer
reads data at a rate that is data dependent, can only be handled by the data-driven TLP.
The simple_data_driven design on GitHub is an
example of this.