Tasks and Channels - 2023.2 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID

UG1399

Release Date

2023-12-18

Version

2023.2 English

The original DATAFLOW model lets you write sequential functions, and then requires the AMD Vitis™ HLS tool to identify dataflow processes (tasks) and make them parallel, analyze and manage dependencies, perform scalar propagation and optimizations such as array-to-stream. Alternatively, the use of hls::task objects requires you to explicitly instantiate tasks and channels, managing parallelization yourself in your algorithm design. The purpose of hls::task is to define a programming model that supports parallel tasks using only streaming data channels. Tasks are not controlled by function call/return, but run whenever data is present in the input streams.

Tip: The hls::task library provides concurrent semantics so that the C-simulation will be consistent with the RTL. This eliminates some of the problems with the sequential dataflow model.

The following is an example of tasks and channels. You can see that only streaming interfaces (hls::stream or hls::stream_of_blocks) are used. You can also see that the top-level function defines the tasks and stream channels using the hls_thread_local keyword.

void func1(hls::stream<int> &in, hls::stream<int> &out1, hls::stream<int> &out2) {
  int data = in.read();
  if (data >= 10)
    out1.write(data);
  else
    out2.write(data);
}
void func2(hls::stream<int> &in, hls::stream<int> &out) {
  out.write(in.read() + 1);
}
void func3(hls::stream<int> &in, hls::stream<int> &out) {
  out.write(in.read() + 2);
}
void top-func(hls::stream<int> &in, hls::stream<int> &out1, hls::stream<int> &out2) {
  hls_thread_local hls::stream<int> s1; // channel connecting t1 and t2
  hls_thread_local hls::stream<int> s2; // channel connecting t1 and t3
 
  hls_thread_local hls::task t1(func1, in, s1, s2); // t1 infinitely runs func1, with input in and outputs s1 and s2
  hls_thread_local hls::task t2(func2, s1, out1);   // t2 infinitely runs func2, with input s1 and output out1
  hls_thread_local hls::task t3(func3, s2, out2);   // t3 infinitely runs func3, with input s2 and output out2
}

The hls::task objects are variables that should be declared as hls_thread_local to keep the variable and the underlying thread alive across multiple calls of the instantiating function (top_func) in the example above. The task objects implicitly manage a thread that runs a function continuously, such as func1, func2, or func3 in the example above. The function is the task body, and has an implicit infinite loop around it.

Each hls::task must be passed a set of arguments that include the function name, input and output channels hls::streams or hls::stream_of_blocks. hls::task objects typically only read and write streaming channels hls::stream and hls::stream_of_blocks.

Both hls::task and the channels that connect to them must be declared as hls_thread_local. The channels must be declared hls_thread_local to keep them alive across calls of the top-level function. Non-stream data, such as scalar and array variables, must all be local to the task functions and cannot be passed as arguments, except as noted in Stable M_AXI and S_AXILITE Accesses below.

Important: Inclusion of hls_task.h makes hls::stream and hls::stream_of_blocks read calls blocking in C-simulation. This means that code that previously relied on reading an empty stream will now result in deadlock during simulation.

Unsynchronized Pointer to Array and Scalar I/O Access

You can also pass scalar values (both local and top-level arguments) and pointers to array arguments in the top-level function, provided that they are marked with the STABLE pragma or directive as described in Un-synchronized I/O in Data-Driven TLP. You must also be careful to ensure that either their value never changes during kernel execution (this is virtually impossible to ensure with hls::task instantiated alone at the top, without regular dataflow processes), or the kernel behavior does not depend on when these arguments change value. For example the process can tolerate a value change at an arbitrary point in time, or some other stream-based synchronization mechanism is used to regulate their access.

Scalar values are passed by reference:

void test(hls::stream<int> &in, hls::stream<int> &out, int &n)

Stable top pointers with the m_axi protocol and an s_axilite offset must be enabled for C/RTL Co-simulation using the cosim.enable_tasks_with_m_axi command as described in Co-Simulation Configuration.

The following is an example of an hls::task design with a stable by-reference scalar argument, whose behavior is by and large insensitive to the exact timing of a change of value of that argument:

void task1(hls::stream<int> &in, hls::stream<int> &out) {
...
}

void task2(hls::stream<int> &in, hls::stream<int> &out) {
...
}

void task3(hls::stream<int> &in, hls::stream<int> &out, int &n) {
  int c = in.read();
  out.write(c + n);
}

void test(hls::stream<int> &in, hls::stream<int> &out, int &n) {
#pragma HLS stable variable=n
  HLS_TASK_STREAM<int> s1;
  HLS_TASK_STREAM<int> s2;
  HLS_TASK t1(task1, in, s1);
  HLS_TASK t2(task2, s1, s2);
  HLS_TASK t3(task3, s2, out, n);
}

The following example shows an hls::task design with a stable m_axi pointer argument in the top-level function. Any accesses to the underlying DRAM buffer will be unsychronized with the process of the function. The if (mem) statement can be used to ensure that the DRAM buffer is accessed only after the host code has initialized the offset register with the address of the buffer in DRAM.

Tip: Because the offset register for the m_axi interface automatically uses the ap_none protocol, the C++ and RTL will re-read its value only when the write_process is executed again.

...
void write_process(hls::stream<int>& in,         hls::stream<int>& out, int* mem)
{
#pragma HLS PIPELINE style=flp
...
  if (mem) {
    mem[...] = ...;
...
    ... = mem[...];
  }
...
}
...
void stable_pointer(int* mem,    hls::stream<int>& in,        hls::stream<int>& out)
{
#pragma HLS INTERFACE mode=m_axi port=mem ...
#pragma HLS stable variable=mem

    hls_thread_local hls::stream<int> int_fifo("int_fifo");
    hls_thread_local hls::stream<int> int_fifo2("int_fifo2");

    hls_thread_local hls::task t1(process_23, in, int_fifo);
    hls_thread_local hls::task t2(process_11, int_fifo, int_fifo2);
    hls_thread_local hls::task t3(write_process, int_fifo2, out, mem);
}

Use of Flushing Pipelines

In general, hls::task designs must use flushing pipelines (flp) or free-running pipelines (frp), as described in Flushing Pipelines and Pipeline Types. Non-flushing pipelines introduce dependencies between process executions and can result in unexpected deadlocks.

Note: You can configure the default flushing behavior in hls::tasks using the syn.compile.pipeline_flush_in_task as described in Compile Options.

Nested Tasks

In the following example, there are two instances of task1 used in task2, both also instantiated as hls::task instances. This demonstrates that in addition to sequential functions the body of an hls::task can be functions containing only hls::task objects.

void task1(hls::stream<int> &in, hls::stream<int> &out) {
  hls_thread_local hls::stream<int> s1;
 
  hls_thread_local hls::task t1(func2, in, s1);  
  hls_thread_local hls::task t2(func3, s1, out);
}
void task2(hls::stream<int> &in1, hls::stream<int> &in2, hls::stream<int> &out1, hls::stream<int> &out2) {
  hls_thread_local hls::task tA(task1, in1, out1);
  hls_thread_local hls::task tB(task1, in2, out2);
}

The use of hls_thread_local is still required to ensure safe multiple instantiation of the intermediate network (tA and tB, both instances of task1 in this example; and safe instances of the leaf-level processes t1 inside tA and tB, both executing different copies of func2, and t2 inside tA and tB.

Simulation and Co-simulation

C-simulation behavior for tasks and channels model will be the same as in C/RTL Co-simulation. Reading from an empty stream was previously allowed with only a warning informing that this condition can cause hangs during simulation. In Vitis HLS 2022.2 reading from an empty stream can cause deadlock even in C-simulation and therefore is now an error condition with the following messages:

In designs containing hls::task objects:

ERROR [HLS SIM]: deadlock detected when simulating hls::tasks. 
Execute C-simulation in debug mode in the GUI and examine the source code 
location of all the blocked hls::stream::read() calls

In designs that do not use hls::task:

ERROR [HLS SIM]: an hls::stream is read while empty, which may result in 
RTL simulation hanging. If this is not expected, execute C simulation in debug mode
in the GUI and examine the source code location of the blocked hls::stream::read() 
call to debug. If this is expected, add -DHLS_STREAM_READ_EMPTY_RETURNS_GARBAGE to 
-cflags to turn this error into a warning and allow empty hls::stream reads to return
 the default value for the data type.

Tip: add -DHLS_STREAM_READ_EMPTY_RETURNS_GARBAGE to -cflags to turn this error into a warning