The original DATAFLOW model lets you write sequential functions, and then
requires the AMD Vitis™ HLS tool to identify
dataflow processes (tasks) and make them parallel, analyze and manage dependencies,
perform scalar propagation and optimizations such as array-to-stream. Alternatively,
the use of hls::task
objects requires you to
explicitly instantiate tasks and channels, managing parallelization yourself in your
algorithm design. The purpose of hls::task
is to
define a programming model that supports parallel tasks using only streaming data
channels. Tasks are not controlled by function call/return, but run whenever data is
present in the input streams.
hls::task
library provides concurrent semantics so that the
C-simulation will be consistent with the RTL. This eliminates some of the problems
with the sequential dataflow model. The following is an example of tasks and channels. You can see that
only streaming interfaces (hls::stream
or hls::stream_of_blocks
) are used. You can also see that
the top-level function defines the tasks and stream channels using the hls_thread_local
keyword.
void func1(hls::stream<int> &in, hls::stream<int> &out1, hls::stream<int> &out2) {
int data = in.read();
if (data >= 10)
out1.write(data);
else
out2.write(data);
}
void func2(hls::stream<int> &in, hls::stream<int> &out) {
out.write(in.read() + 1);
}
void func3(hls::stream<int> &in, hls::stream<int> &out) {
out.write(in.read() + 2);
}
void top-func(hls::stream<int> &in, hls::stream<int> &out1, hls::stream<int> &out2) {
hls_thread_local hls::stream<int> s1; // channel connecting t1 and t2
hls_thread_local hls::stream<int> s2; // channel connecting t1 and t3
hls_thread_local hls::task t1(func1, in, s1, s2); // t1 infinitely runs func1, with input in and outputs s1 and s2
hls_thread_local hls::task t2(func2, s1, out1); // t2 infinitely runs func2, with input s1 and output out1
hls_thread_local hls::task t3(func3, s2, out2); // t3 infinitely runs func3, with input s2 and output out2
}
The hls::task
objects are variables that
should be declared as hls_thread_local
to keep the
variable and the underlying thread alive across multiple calls of the instantiating
function (top_func
) in the example above. The task
objects implicitly manage a thread that runs a function continuously, such as
func1
, func2
, or func3
in the example above.
The function is the task body, and has an implicit infinite loop around it.
Each hls::task
must be passed a set of
arguments that include the function name, input and output channels hls::streams
or hls::stream_of_blocks
. hls::task
objects typically only read and write streaming channels hls::stream
and hls::stream_of_blocks
.
Both hls::task
and the channels
that connect to them must be declared as hls_thread_local
. The channels must be declared hls_thread_local
to keep them alive across calls of
the top-level function. Non-stream data, such as scalar and array variables, must
all be local to the task functions and cannot be passed as arguments, except as
noted in Stable M_AXI and S_AXILITE Accesses below.
hls_task.h
makes hls::stream
and hls::stream_of_blocks
read calls blocking in C-simulation. This means
that code that previously relied on reading an empty stream will now result in
deadlock during simulation.Unsynchronized Pointer to Array and Scalar I/O Access
You can also pass scalar values (both local and top-level arguments) and pointers to
array arguments in the top-level function, provided that they are marked with the
STABLE
pragma or directive as described in Un-synchronized I/O in Data-Driven TLP. You must also be careful to ensure that either
their value never changes during kernel execution (this is virtually impossible to
ensure with hls::task
instantiated alone at the top, without
regular dataflow processes), or the kernel behavior does not depend on when these
arguments change value. For example the process can tolerate a value change at an
arbitrary point in time, or some other stream-based synchronization mechanism is
used to regulate their access.
Scalar values are passed by reference:
void test(hls::stream<int> &in, hls::stream<int> &out, int &n)
Stable top pointers with the m_axi
protocol and an
s_axilite
offset must be enabled for C/RTL Co-simulation using
the cosim.enable_tasks_with_m_axi
command as
described in Co-Simulation Configuration.
The following is an example of an hls::task
design with a stable
by-reference scalar argument, whose behavior is by and large insensitive to the
exact timing of a change of value of that argument:
void task1(hls::stream<int> &in, hls::stream<int> &out) {
...
}
void task2(hls::stream<int> &in, hls::stream<int> &out) {
...
}
void task3(hls::stream<int> &in, hls::stream<int> &out, int &n) {
int c = in.read();
out.write(c + n);
}
void test(hls::stream<int> &in, hls::stream<int> &out, int &n) {
#pragma HLS stable variable=n
HLS_TASK_STREAM<int> s1;
HLS_TASK_STREAM<int> s2;
HLS_TASK t1(task1, in, s1);
HLS_TASK t2(task2, s1, s2);
HLS_TASK t3(task3, s2, out, n);
}
The following example shows an hls::task
design with a stable
m_axi
pointer argument in the top-level function. Any accesses
to the underlying DRAM buffer will be unsychronized with the process of the
function. The if (mem)
statement can be used to ensure that the
DRAM buffer is accessed only after the host code has initialized the offset register
with the address of the buffer in DRAM.
m_axi
interface automatically uses the ap_none
protocol, the C++ and RTL will re-read its value only when the
write_process
is executed again....
void write_process(hls::stream<int>& in, hls::stream<int>& out, int* mem)
{
#pragma HLS PIPELINE style=flp
...
if (mem) {
mem[...] = ...;
...
... = mem[...];
}
...
}
...
void stable_pointer(int* mem, hls::stream<int>& in, hls::stream<int>& out)
{
#pragma HLS INTERFACE mode=m_axi port=mem ...
#pragma HLS stable variable=mem
hls_thread_local hls::stream<int> int_fifo("int_fifo");
hls_thread_local hls::stream<int> int_fifo2("int_fifo2");
hls_thread_local hls::task t1(process_23, in, int_fifo);
hls_thread_local hls::task t2(process_11, int_fifo, int_fifo2);
hls_thread_local hls::task t3(write_process, int_fifo2, out, mem);
}
Use of Flushing Pipelines
In general, hls::task
designs must use
flushing pipelines (flp
) or free-running pipelines
(frp
), as described in Flushing Pipelines and Pipeline Types.
Non-flushing pipelines introduce dependencies between process executions and can
result in unexpected deadlocks.
hls::tasks
using the
syn.compile.pipeline_flush_in_task
as described in Compile Options.Nested Tasks
In the following example, there are two instances of task1 used in
task2, both also instantiated as hls::task
instances. This demonstrates that in addition to sequential functions the body of an
hls::task
can be functions containing only
hls::task
objects.
void task1(hls::stream<int> &in, hls::stream<int> &out) {
hls_thread_local hls::stream<int> s1;
hls_thread_local hls::task t1(func2, in, s1);
hls_thread_local hls::task t2(func3, s1, out);
}
void task2(hls::stream<int> &in1, hls::stream<int> &in2, hls::stream<int> &out1, hls::stream<int> &out2) {
hls_thread_local hls::task tA(task1, in1, out1);
hls_thread_local hls::task tB(task1, in2, out2);
}
The use of hls_thread_local
is
still required to ensure safe multiple instantiation of the intermediate network
(tA
and tB
,
both instances of task1
in this example; and safe
instances of the leaf-level processes t1
inside
tA
and tB
,
both executing different copies of func2
, and
t2
inside tA
and tB
.
Simulation and Co-simulation
C-simulation behavior for tasks and channels model will be the same as in C/RTL Co-simulation. Reading from an empty stream was previously allowed with only a warning informing that this condition can cause hangs during simulation. In Vitis HLS 2022.2 reading from an empty stream can cause deadlock even in C-simulation and therefore is now an error condition with the following messages:
- In designs containing
hls::task
objects:ERROR [HLS SIM]: deadlock detected when simulating hls::tasks. Execute C-simulation in debug mode in the GUI and examine the source code location of all the blocked hls::stream::read() calls
- In designs that do not use
hls::task
:ERROR [HLS SIM]: an hls::stream is read while empty, which may result in RTL simulation hanging. If this is not expected, execute C simulation in debug mode in the GUI and examine the source code location of the blocked hls::stream::read() call to debug. If this is expected, add -DHLS_STREAM_READ_EMPTY_RETURNS_GARBAGE to -cflags to turn this error into a warning and allow empty hls::stream reads to return the default value for the data type.
Tip: add-DHLS_STREAM_READ_EMPTY_RETURNS_GARBAGE
to-cflags
to turn this error into a warning