The original DATAFLOW model lets you write sequential functions, and then
requires the AMD Vitis™ HLS tool to identify
dataflow processes (tasks) and make them parallel, analyze and manage dependencies,
perform scalar propagation and optimizations such as array-to-stream. Alternatively,
the use of hls::task
objects requires you to
explicitly instantiate tasks and channels, managing parallelization yourself in your
algorithm design. The purpose of hls::task
is to
define a programming model that supports parallel tasks using only streaming data
channels. Tasks are not controlled by function call/return, but run whenever the
input streams are not empty.
hls::task
library provides concurrent semantics so that the
C-simulation will be consistent with the RTL. This eliminates some of the problems
with the sequential dataflow model. The following is an example of tasks and channels. You can see that
only streaming interfaces (hls::stream
or hls::stream_of_blocks
) are used. You can also see that
the top-level function defines the tasks and stream channels using the hls_thread_local
keyword.
void func1(hls::stream<int> &in, hls::stream<int> &out1, hls::stream<int> &out2) {
int data = in.read();
if (data >= 10)
out1.write(data);
else
out2.write(data);
}
void func2(hls::stream<int> &in, hls::stream<int> &out) {
out.write(in.read() + 1);
}
void func3(hls::stream<int> &in, hls::stream<int> &out) {
out.write(in.read() + 2);
}
void top-func(hls::stream<int> &in, hls::stream<int> &out1, hls::stream<int> &out2) {
hls_thread_local hls::stream<int> s1; // channel connecting t1 and t2
hls_thread_local hls::stream<int> s2; // channel connecting t1 and t3
hls_thread_local hls::task t1(func1, in, s1, s2); // t1 infinitely runs func1, with input in and outputs s1 and s2
hls_thread_local hls::task t2(func2, s1, out1); // t2 infinitely runs func2, with input s1 and output out1
hls_thread_local hls::task t3(func3, s2, out2); // t3 infinitely runs func3, with input s2 and output out2
}
The hls::task
objects are
variables that should be declared as hls_thread_local
in order to keep the variable and the underlying
thread alive across multiple calls of the instantiating function (top_func
) in the example above. The task objects
implicitly manage a thread that runs a function continuously, such as func1
, func2
, or
func3
in the example above. The function is
the task body, and has an implicit infinite loop around it.
Each hls::task
must be passed a
set of arguments that include the function name, input and output channels hls::streams
or hls::stream_of_blocks
. The channels must also be declared hls_thread_local
to keep them alive across calls of
the top-level function. Non-stream data, such as scalar and array variables, must
all be local to the task functions and cannot be passed as arguments.
hls_task.h
makes hls::stream
and hls::stream_of_blocks
read calls blocking in C-simulation. This means
that code that previously relied on reading an empty stream will now result in
deadlock during simulation.Supported I/O types
hls::task
objects can only read and write streaming
channels hls::stream
and hls::stream_of_blocks
.
Note that both hls::task
and the channels that connect to them must
be declared as hls_thread_local
.
Use of flushing pipelines
In general, hls::task
designs must always use
flushing pipelines (flp) or free-running pipelines (frp), which also flush, because
non-flushing pipelines introduce dependencies between process executions and thus
may result in unexpected deadlocks.
Nested Tasks
In the following example, there are two instances of task1 used in
task2, both also instantiated as hls::task
instances. This demonstrates that in addition to sequential functions the body of an
hls::task
can be functions containing only
hls::task
objects.
void task1(hls::stream<int> &in, hls::stream<int> &out) {
hls_thread_local hls::stream<int> s1;
hls_thread_local hls::task t1(func2, in, s1);
hls_thread_local hls::task t2(func3, s1, out);
}
void task2(hls::stream<int> &in1, hls::stream<int> &in2, hls::stream<int> &out1, hls::stream<int> &out2) {
hls_thread_local hls::task tA(task1, in1, out1);
hls_thread_local hls::task tB(task1, in2, out2);
}
The use of hls_thread_local
is
still required to ensure safe multiple instantiation of the intermediate network
(tA
and tB
,
both instances of task1
in this example; and safe
instances of the leaf-level processes t1
inside
tA
and tB
,
both executing different copies of func2
, and
t2
inside tA
and tB
.
Simulation and Co-simulation
C-simulation behavior for tasks and channels model will be the same as in C/RTL Co-simulation. Reading from an empty stream was previously allowed with only a warning informing that this condition can cause hangs during simulation. In Vitis HLS 2022.2 reading from an empty stream can cause deadlock even in C-simulation and therefore is now an error condition with the following messages:
- In designs containing
hls::task
objects:ERROR [HLS SIM]: deadlock detected when simulating hls::tasks. Execute C-simulation in debug mode in the GUI and examine the source code location of all the blocked hls::stream::read() calls
- In designs that do not use
hls::task
:ERROR [HLS SIM]: an hls::stream is read while empty, which may result in RTL simulation hanging. If this is not expected, execute C simulation in debug mode in the GUI and examine the source code location of the blocked hls::stream::read() call to debug. If this is expected, add -DHLS_STREAM_READ_EMPTY_RETURNS_GARBAGE to -cflags to turn this error into a warning and allow empty hls::stream reads to return the default value for the data type.
Tip: add-DHLS_STREAM_READ_EMPTY_RETURNS_GARBAGE
to-cflags
to turn this error into a warning