The following table highlights factors of the HLS design that can help you determine when to apply control-driven task-level parallelism (TLP) or data-driven TLP.
Control-Driven TLP | Data-Driven TLP |
---|---|
|
|
As the above table indicates, the two forms of task-level parallelism presented have different use cases and advantages. However, sometimes it is not possible to design an entire application that is purely data-driven TLP, while some portion of the design can still be constructed as a purely streaming design. In this case a mixed control-driven/data-driven model can be useful to create the application. Consider the following mixed_control_and_data_driven example from GitHub.
void dut(int in[N], int out[N], int n) {
#pragma HLS dataflow
hls_thread_local hls::split::round_robin<int, NP> split1;
hls_thread_local hls::merge::round_robin<int, NP> merge1;
read_in(in, n, split1.in);
// Task-Channels
hls_thread_local hls::task t[NP];
for (int i=0; i<NP; i++) {
#pragma HLS unroll
t[i](worker, split1.out[i], merge1.in[i]);
}
write_out(merge1.out, out, n);
}
In the above example, there are two distinct regions - a dataflow region
that has the functions read_in/write_out in which the sequential semantics is preserved
- i.e. read_in
will be executed before write_out
and a task-channel region that contains the
dynamic instantiation of 4 tasks (since NP = 4 in this example) along with some special
type of channels called a split
or a merge
channel. A split channel is one that has a single
input but has multiple outputs - in this case, the split channel has 4 outputs as
described in HLS Split/Merge Library. Similarly, a merge channel has multiple
inputs but only one output.
In addition, to the ports, these channels also support an internal job
scheduler. In the above example, both the merge and the split channels have selected a
round-robin scheduler that assigns the incoming data to each of the 4 tasks, one by one
starting with worker_U0
. If a load balancing scheduler
had been chosen then the incoming data will have been assigned to the first available
worker task (and this would lead to a non-deterministic simulation since this order
might be different each time you run the simulation). Since this is a pure task-channel
region, the 4 tasks are executed in parallel as soon as there is data in their incoming
stream. Refer to the merge_split example on Github for more
examples of these concepts.
It is important to note that, although the code above may give the impression that each task is "called" in the loop, and connected to a potentially different pair of channels every time the loop body is executed, in reality, this usage implies a static instantiation, i.e.:
- each
t[i](...)
call must be executed exactly once per execution ofdut()
. - the loop over
i
must be fully unrolled, to infer a corresponding set of 4 instances in RTL. - The
dut()
function must be called exactly once by the testbench. - Each split output or merge input must be bound to exactly one
hls::task
instance.
While it is true that for hls::task
objects the order of specification does not matter, for the control-driven dataflow
network, Vitis HLS must be able to see that there is
a chain of processes, such as from read_in
to write_out
. To define this chain of processes, Vitis HLS uses the calling order, which for hls::tasks
is also the declaration order. This means that
the model must define an explicit order from the read_in
function to the hls::task
region
and then finally to the write_out
function in the
dataflow region as shown in the example above.
- If a control-based process (i.e. regular dataflow) produces a
stream for an
hls::task
, then it must be called before the declaration of the tasks in the code - If a control-based process consumes a stream from an
hls::task
, then it must be called after the declaration of the tasks in the code
Violation of the above rules can cause unexpected outcomes since each of
the NP hls::task
instances is statically bound to the
channels that are used in the first invocation of t[i](...)
.
The following diagram shows the graph of this mixed task-channel and dataflow example: