Description
The DATAFLOW pragma enables task-level pipelining as described in Exploiting Task Level Parallelism: Dataflow Optimization, allowing functions and loops to overlap in their operation, increasing the concurrency of the RTL implementation and increasing the overall throughput of the design.
All operations are performed sequentially in a C description. In the absence
of any directives that limit resources (such as pragma HLS
allocation
), the Vitis HLS tool seeks to
minimize latency and improve concurrency. However, data dependencies can limit this. For
example, functions or loops that access arrays must finish all read/write accesses to the
arrays before they complete. This prevents the next function or loop that consumes the data
from starting operation. The DATAFLOW optimization enables the operations in a function or
loop to start operation before the previous function or loop completes all its
operations.
When the DATAFLOW pragma is specified, the HLS tool analyzes the dataflow between sequential functions or loops and creates channels (based on ping pong RAMs or FIFOs) that allow consumer functions or loops to start operation before the producer functions or loops have completed. This allows functions or loops to operate in parallel, which decreases latency and improves the throughput of the RTL.
If no initiation interval (number of cycles between the start of one function or loop and the next) is specified, the HLS tool attempts to minimize the initiation interval and start operation as soon as data is available.
config_dataflow
command specifies the default memory channel and
FIFO depth used in dataflow optimization.For the DATAFLOW optimization to work, the data must flow through the design from one task to the next. The following coding styles prevent the HLS tool from performing the DATAFLOW optimization. Refer to Dataflow Optimization Limitations for additional details.
- Single-producer-consumer violations
- Feedback between tasks
- Conditional execution of tasks
- Loops with multiple exit conditions
Finally, the DATAFLOW optimization has no hierarchical implementation. If a sub-function or loop contains additional tasks that might benefit from the DATAFLOW optimization, you must apply the optimization to the loop, the sub-function, or inline the sub-function.
Syntax
Place the pragma in the C source within the boundaries of the region, function, or loop.
#pragma HLS dataflow [disable_start_propagation]
-
disable_start_propagation
: Optionally disables the creation of a start FIFO used to propagate a start token to an internal process. Such FIFOs can sometimes be a bottleneck for performance.
Example
Specifies DATAFLOW optimization within the loop wr_loop_j
.
wr_loop_j: for (int j = 0; j < TILE_PER_ROW; ++j) {
#pragma HLS DATAFLOW
wr_buf_loop_m: for (int m = 0; m < TILE_HEIGHT; ++m) {
wr_buf_loop_n: for (int n = 0; n < TILE_WIDTH; ++n) {
#pragma HLS PIPELINE
// should burst TILE_WIDTH in WORD beat
outFifo >> tile[m][n];
}
}
wr_loop_m: for (int m = 0; m < TILE_HEIGHT; ++m) {
wr_loop_n: for (int n = 0; n < TILE_WIDTH; ++n) {
#pragma HLS PIPELINE
outx[TILE_HEIGHT*TILE_PER_ROW*TILE_WIDTH*i+TILE_PER_ROW*TILE_WIDTH*m+TILE_WIDTH*j+n] = tile[m][n];
}
}