Mixing Data-Driven and Control-Driven Models - 2023.1 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2023-07-17
Version
2023.1 English

The following table highlights factors of the HLS design that can help you determine when to apply control-driven task-level parallelism (TLP) or data-driven TLP.

Control-Driven TLP Data-Driven TLP
  • HLS Design requires control signals to start/stop the process​
  • Design requires non-local memory access​
  • Design requires interaction with an external software application
  • Designs with multiple processes running the same number of executions
  • Requires RTL simulation to model the effect of the parallelism​
  • HLS Design uses a completely data-driven approach that does not require control signals to start/stop the process
  • Design uses pure streaming data transfers
  • Designs with data-dependent multi-rate behavior​
    • Producer writes data or consumer reads data at a rate that is data dependent
    • Easier to model for designs that require feedback between processes​
  • Task-level parallelism is observable in C simulation as well as RTL simulation.

As the above table indicates, the two forms of task-level parallelism presented have different use cases and advantages. However, sometimes it is not possible to design an entire application that is purely data-driven TLP, while some portion of the design can still be constructed as a purely streaming design. In this case a mixed control-driven/data-driven model can be useful to create the application. Consider the following mixed_control_and_data_driven example from GitHub.

void dut(int in[N], int out[N], int n) {
#pragma HLS dataflow
  hls_thread_local hls::split::round_robin<int, NP> split1;
  hls_thread_local hls::merge::round_robin<int, NP> merge1;
 
  read_in(in, n, split1.in);
 
  // Task-Channels
  hls_thread_local hls::task t[NP];
  for (int i=0; i<NP; i++) {
#pragma HLS unroll
    t[i](worker, split1.out[i], merge1.in[i]);
  }
 
  write_out(merge1.out, out, n);
}

In the above example, there are two distinct regions - a dataflow region that has the functions read_in/write_out in which the sequential semantics is preserved - i.e. read_in will be executed before write_out and a task-channel region that contains the dynamic instantiation of 4 tasks (since NP = 4 in this example) along with some special type of channels called a split or a merge channel. A split channel is one that has a single input but has multiple outputs - in this case, the split channel has 4 outputs as described in HLS Split/Merge Library. Similarly, a merge channel has multiple inputs but only one output.

In addition, to the ports, these channels also support an internal job scheduler. In the above example, both the merge and the split channels have selected a round-robin scheduler that assigns the incoming data to each of the 4 tasks, one by one starting with worker_U0. If a load balancing scheduler had been chosen then the incoming data will have been assigned to the first available worker task (and this would lead to a non-deterministic simulation since this order might be different each time you run the simulation). Since this is a pure task-channel region, the 4 tasks are executed in parallel as soon as there is data in their incoming stream. Refer to the merge_split example on Github for more examples of these concepts.

It is important to note that, although the code above may give the impression that each task is "called" in the loop, and connected to a potentially different pair of channels every time the loop body is executed, in reality, this usage implies a static instantiation, i.e.:

  • each t[i](...) call must be executed exactly once per execution of dut().
  • the loop over i must be fully unrolled, to infer a corresponding set of 4 instances in RTL.
  • The dut() function must be called exactly once by the testbench.
  • Each split output or merge input must be bound to exactly one hls::task instance.

While it is true that for hls::task objects the order of specification does not matter, for the control-driven dataflow network, Vitis HLS must be able to see that there is a chain of processes, such as from read_in to write_out. To define this chain of processes, Vitis HLS uses the calling order, which for hls::tasks is also the declaration order. This means that the model must define an explicit order from the read_in function to the hls::task region and then finally to the write_out function in the dataflow region as shown in the example above.

Generally:
  • If a control-based process (i.e. regular dataflow) produces a stream for an hls::task, then it must be called before the declaration of the tasks in the code
  • If a control-based process consumes a stream from an hls::task, then it must be called after the declaration of the tasks in the code

Violation of the above rules can cause unexpected outcomes since each of the NP hls::task instances is statically bound to the channels that are used in the first invocation of t[i](...).

The following diagram shows the graph of this mixed task-channel and dataflow example:

Figure 1. Mixed Task-Channel and Dataflow