hls::split<>
or hls::merge<>
objects in your code include the header file hls_np_channel.h as shown in the example below.For use in Dataflow processes, split/merge channels let you create one-to-many or many-to-one type channels to distribute data to multiple tasks, or aggregate data from multiple tasks. These channels have a built-in job scheduler using either a round-robin approach in which data are sequentially distributed or gathered across the channels, or a load balancing approach that is determined based on channel availability.
As shown in the figure below, data is read from an input stream and split through the round-robin scheduler mechanism, and distributed to associated worker tasks. After a worker completes the task, it writes the output which is merged also using the round-robin scheduler, into a single stream.
A split channel has one producer and many consumers, and can be typically used to distribute tasks to a set of workers, abstracting and implementing in RTL the distribution logic, and thus leading to both better performance and fewer resources. The distribution of an input to one of the N outputs can be:
- Round-robin, where the consumers read the input data in a fixed rotating order, thus ensuring deterministic behavior, but not allowing load sharing with dynamically varying computational loads for the workers.
- Load balancing, where the first consumer to attempt a read will read the first input data, thus ensuring good load balancing, but with non-deterministic results.
A merge channel has many producers and a single consumer, and operates based on the reverse logic:
- Round-robin, where the producer output data is merged using a fixed rotating order, thus ensuring deterministic behavior, but not allowing load sharing with dynamically varying computational loads for the workers.
- The load balancing merge channel, where the first producer that completes the work will write first into the channel with non-deterministic results.
The general idea of split and merge is that with the round_robin scheduler data are distributed around to workers for the split, and read from workers for the merge, in a deterministic fashion. So if all workers compute the same function the result is the same as with a single worker, but the performance is better.
If the workers perform different functions, then your design must ensure that the correct data item is sent to the correct function in the round-robin order of workers, starting from out[0] or in[0] respectively.
Specification
hls::split::load_balancing<DATATYPE, NUM_PORTS[, DEPTH, N_PORT_DEPTH]> name;
hls::split::round_robin<DATATYPE, NUM_PORTS[, DEPTH]> name
hls::merge::load_balancing<DATATYPE, NUM_PORTS[, DEPTH]> name
hls::merge::round_robin<DATATYPE, NUM_PORTS[, DEPTH]> name
-
round_robin
/load_balancing
: Specifies the type of scheduler mechanism used for the channel. -
DATATYPE: Specifies the data type on the channel. This
has the same restrictions as standard
hls::stream
. The DATATYPE can be:- Any C++ native data type
- A Vitis HLS
arbitrary precision type (for example,
ap_int<>
,ap_ufixed<>
) - A user-defined struct containing either of the above types
-
NUM_PORTS: Indicates the number of
write ports required for split (
1:num
) or read-ports required for merge (num:1
) operation. - DEPTH: Optional argument is the depth of the main buffer, located before the split or after the merge. This is optional, and the default depth is 2 when not specified.
-
N_PORT_DEPTH: Optional field for round-robin to
specify the depth of output buffers applied after split, or before merge.
This is optional and the default depth is 0 when not specified. Tip: To specify the optional
N_PORT_DEPTH
value, you must also specifyDEPTH
. - name: Indicates the name of the created channel object
#include "hls_np_channel.h"
const int N = 16;
const int NP = 4;
void dut(int in[N], int out[N], int n) {
#pragma HLS dataflow
hls::split::round_robin<int, NP> split1;
hls::merge::round_robin<int, NP> merge1;
read_in(in, n, split1.in);
// Task-Channels
hls_thread_local hls::task t[NP];
for (int i=0; i<NP; i++) {
#pragma HLS unroll
t[i](worker, split1.out[i], merge1.in[i]);
}
write_out(merge1.out, out, n);
}
hls::task
objects. However,
this is simply a feature of the example and not a requirement of split
/merge
channels. Application of Split/Merge
The main use of split and merge is to support multiple compute engine instantiation to fully exploit the bandwidth of a DDR or HBM port. In this case, the producer is a load process that reads a burst of data from MAXI, and then passes the individual packets of data to be processed to a number of workers via the split channel. Use the round-robin protocol if the workers take similar amounts of time, or load balancing the execution time per input if variable. The consumer performs the reverse, writing data back into DRAM.
These channels are modeled as implementing
hls::stream
objects at both ends of the split or the merge
channel. This means that a split or merge channel end can be connected to any
process that takes an hls::stream
as an input or an output. The
process does not need to be aware of the type of channel connection. Therefore, they
can be used both for standard dataflow and for hls::task
objects.
The following example shows how split can be used by a single produce and multiple consumers:
#include "hls_np_channel.h"
void producer(hls::stream<int> &s) {
s.write(xxx);
}
void consumer1(hls::stream<int> &s) {
... = s.read();
}
void consumer2(hls::stream<int> &s) {
... = s.read();
}
void top-func() {
#pragma HLS dataflow
hls::split::load_balancing<int, 4, 6> s; // NUM_PORTS=4, DEPTH=6
producer(s.in, ...);
consumer1(s.out[0], ...);
consumer2(s.out[1], ...);
consumer3(s.out[2], ...);
consumer4(s.out[3], ...);
}