The performance of nested dataflow regions can be greatly impacted if any non-dataflow regions are present other than at the deepest level of the function/region hierarchy (for example, the leaf node of the dataflow graph). This is because sequential FSMs do not allow overlapped execution of the (possibly dataflow) regions that they contain..
Nested dataflow pitfall example 1: dataflow pragma missing in intermediate function.
void proc1(...) {
#pragma HLS dataflow
...
}
void proc2(...) {
... // missing dataflow pragma
proc1();
...
}
void top(...) {
#pragma HLS dataflow
...
proc2(...);
...
}
Nested dataflow pitfall example 2: code outside a dataflow loop.
void proc1(...) {
#pragma HLS dataflow
...
}
void proc2(...) {
a = x + y; // this code is NOT implemented as a dataflow process but as a sequential FSM
for (int i = 0; i < N; i++) {
#pragma HLS dataflow
proc1(a, ...);
...
}
}
void top(...) {
#pragma HLS dataflow
...
proc2(...);
...
}
In both examples above, proc2 is not a dataflow region, which means that its ap_ready and ap_done are generated at the same time, i.e., its II and latency are identical. This drastically limits the throughput advantage of dataflow for proc1 and proc2, which is a smaller II than latency.