The performance of nested dataflow regions will be greatly impacted if any non-dataflow regions are present other than at the deepest level of the funcion/region hierarchy (for example, the leaf node of the dataflow graph). This is because sequential FSMs do not allow overlapped execution of the dataflow regions that they contain.
Nested dataflow example 1: dataflow pragma missing in intermediate function.
void proc1(...) {
#pragma HLS dataflow
...
}
void proc2(...) {
... // missing dataflow pragma
proc1();
...
}
void top(...) {
#pragma HLS dataflow
...
proc2(...);
...
}
Nested dataflow example 2: code outside a dataflow loop.
void proc1(...) {
#pragma HLS dataflow
...
}
void proc2(...) {
a = x + y; // this sequential code is NOT implemented as a sequential FSM, not a dataflow process
for (int i = 0; i<N; i++) {
#pragma HLS dataflow
proc1(a, ...);
...
}
}
void top(...) {
#pragma HLS dataflow
...
proc2(...);
...
}
In both examples above, proc2 is not a dataflow region, which means that its ap_ready and ap_done are generated at the same time, for example, its II and latency are identical. This will drastically limit the throughput advantage of dataflow for proc1 and proc2, which is a faster II than latency.