pragma HLS dataflow - pragma HLS dataflow - 2025.2 English - UG1399

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2026-01-22
Version
2025.2 English

Description

The DATAFLOW pragma enables task-level pipelining, allowing functions and loops to overlap in their operation, increasing the concurrency of the RTL implementation and increasing the overall throughput of the design.

All operations are performed sequentially in a C description. In the absence of any directives that limit resources (such as pragma HLS allocation), the Vitis HLS tool seeks to minimize latency and improve concurrency. However, data dependencies can limit this. For example, functions or loops that access arrays must finish all read/write accesses to the arrays before they complete. This prevents the next function or loop that consumes the data from starting operation. The DATAFLOW optimization enables the operations in a function or loop to start operation before the previous function or loop completes all its operations.

Figure 1. DATAFLOW Pragma

When the DATAFLOW pragma is specified, the HLS tool analyzes the dataflow between sequential functions or loops and creates channels (based on ping pong RAMs or FIFOs) that allow consumer functions or loops to start operation before the producer functions or loops have completed. This allows functions or loops to operate in parallel, which decreases latency and improves the throughput of the RTL.

Tip: The config_dataflow command specifies the default memory channel and FIFO depth used in dataflow optimization.

For the DATAFLOW optimization to work, the data must flow through the design from task to task. One should pay special attention to the following situations. Refer to Limitations of Control-Driven Task-Level Parallelism and Dataflow Region Coding Style for more details.

  • Single-producer-consumer violations
  • Feedback between tasks
  • Conditional execution of tasks
  • Loops with multiple exit conditions
Important: If any of these coding styles are present, the HLS tool issues a message.

Finally, the DATAFLOW optimization is not hierarchical by default. If a sub-function or loop contains additional tasks that might benefit from the DATAFLOW optimization, you must apply the optimization to the loop, the sub-function, or inline the sub-function.

Syntax

Place the pragma in the C source within the boundaries of the region, function, or loop.

#pragma HLS dataflow [disable_start_propagation]
  • disable_start_propagation: Optionally disables the creation of a start FIFO used to propagate a start token to an internal process. Such FIFOs can sometimes be a bottleneck for performance.

Example

Specifies DATAFLOW optimization within the loop wr_loop_j.

void read_fifo(int tile[TILE_HEIGHT][TILE_WIDTH], hls::stream<int> &inFifo) {
  for (int m = 0; m < TILE_HEIGHT; ++m) { 
    for (int n = 0; n < TILE_WIDTH; ++n) {
    #pragma HLS PIPELINE
       tile[m][n] = inFifo.read();
    }
  }
}

void write_out(int tile[TILE_HEIGHT][TILE_WIDTH], int *outx, int I, int j) {
  for (int m = 0; m < TILE_HEIGHT; ++m) {
    for (int n = 0; n < TILE_WIDTH; ++n) {
    #pragma HLS PIPELINE
      outx[TILE_HEIGHT*TILE_PER_ROW*TILE_WIDTH*i +TILE_PER_ROW*TILE_WIDTH*m+TILE_WIDTH*j+n] = tile[m][n];
    }
 }
}
....
wr_loop_i: 
for (int i = 0; i < TILE_PER_COL; ++i) {
  wr_loop_j: 
  for (int j = 0; j < TILE_PER_ROW; ++j) {
    #pragma HLS DATAFLOW
    int tile[TILE_HEIGHT][TILE_WIDTH]; 
    read_fifo(tile, inFifo);
    write_out(tile, outx, i, j);
  } 
}