The following table highlights factors of the HLS design that can help you determine when to apply control-driven task-level parallelism (TLP) or data-driven TLP.
Control-Driven TLP | Data-Driven TLP |
---|---|
HLS Design requires control signals to start/stop the process | HLS Design uses a completely data-driven approach that does not require control signals to start/stop the process |
Design requires non-local memory access | Design uses pure streaming data transfers |
Design requires interaction with an external software application | Designs with data-dependent multi-rate behavior​:
|
Designs with multiple processes running the same number of executions | Task-level parallelism is observable in C simulation and RTL simulation. |
Requires RTL simulation to model the effect of the parallelism​ |
As the preceding table indicates, the two forms of task-level parallelism presented have different use cases and advantages. However, sometimes it is not possible to design an entire application that is purely data-driven TLP, while some portion of the design can still be constructed as a purely streaming design. In this case a mixed control-driven/data-driven model can be useful to create the application. Consider the following mixed_control_and_data_driven example from GitHub.
void dut(int in[N], int out[N], int n) {
#pragma HLS dataflow
hls_thread_local hls::split::round_robin<int, NP> split1;
hls_thread_local hls::merge::round_robin<int, NP> merge1;
read_in(in, n, split1.in);
// Task-Channels
hls_thread_local hls::task t[NP];
for (int i=0; i<NP; i++) {
#pragma HLS unroll
t[i](worker, split1.out[i], merge1.in[i]);
}
write_out(merge1.out, out, n);
}
In the above example, there are two distinct regions - a dataflow region that
has the functions read_in/write_out in which the sequential semantics is preserved - for
example, read_in
can be executed before write_out
and a task-channel region that contains the
dynamic instantiation of 4 tasks (since NP = 4 in this example) along with some special
type of channels called a split
or a merge
channel. A split channel is one that has a single
input but has multiple outputs - in this case, the split channel has 4 outputs as
described in HLS Split/Merge Library. Similarly, a merge
channel has multiple inputs but only one output.
In addition, to the ports, these channels also support an internal job
scheduler. In the above example, both the merge and the split channels have selected a
round-robin scheduler that assigns the incoming data to each of the 4 tasks, one by one
starting with worker_U0
. If a load balancing scheduler
had been chosen then the incoming data has been assigned to the first available worker
task (and this would lead to a non-deterministic simulation since this order might be
different each time you run the simulation). Because this is a pure task-channel region,
the 4 tasks are executed in parallel as soon as there is data in their incoming stream.
Refer to the merge_split example on GitHub for more
examples of these concepts.
It is important to note that, although the code above can give the impression that each task is "called" in the loop, and connected to a potentially different pair of channels every time the loop body is executed, in reality, this usage implies a static instantiation, i.e.:
- each
t[i](...)
call must be executed exactly once per execution ofdut()
. - the loop over
i
must be fully unrolled, to infer a corresponding set of 4 instances in RTL. - The
dut()
function must be called exactly once by the test bench. - Each split output or merge input must be bound to exactly one
hls::task
instance.
While it is true that for hls::task
objects the order of specification does not matter, for the control-driven dataflow
network, Vitis HLS must be able to see that there is
a chain of processes, such as from read_in
to write_out
. To define this chain of processes, Vitis HLS uses the calling order, which for hls::tasks
is also the declaration order. This means that
the model must define an explicit order from the read_in
function to the hls::task
region
and then finally to the write_out
function in the
dataflow region as shown in the example above.
- If a control-based process (i.e. regular dataflow) produces a
stream for an
hls::task
, then it must be called before the declaration of the tasks in the code - If a control-based process consumes a stream from an
hls::task
, then it must be called after the declaration of the tasks in the code
Violation of the above rules can cause unexpected outcomes because each of the
NP hls::task
instances is statically bound to the
channels that are used in the first invocation of t[i](...)
.
The following diagram shows the graph of this mixed task-channel and dataflow example: