set_directive_loop_flatten - 2025.2 English - UG1399

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2026-01-22
Version
2025.2 English

Description

Allows inner nested loops to be collapsed (flattened) into a single loop so that pipelining can be applied on all iterations of the loops with the goal of achieving better latency.

Only innermost loops (after possible unrolling inside) can be pipelined. Outer loops can be only dataflow or sequential. When a loop above a pipelined loop is sequential, its iterations are executed in sequence and, for each iteration, the inner loop is fully executed once. In the RTL implementation, this requires one clock cycle to move from the outer loop to the inner loop, one clock cycle to move back from the inner loop to the outer loop, plus, in between, the whole latency to complete all iterations of the inner loop. On the contrary, flattening nested loops allows them to be optimized and pipelined as a single loop so that overlapping (pipelining) between the iterations of the outer loop and the iterations of different calls to the inner loop can occur. In general, flattening improves performance, but:

  • Flattening cannot always be achieved, some coding style needs to be followed.
  • In some cases, timing and even II can be degraded, depending on the operations and/or dependencies in the outer loop.

Apply the set_directive_loop_flatten pragma to the loop body of the innermost loop in the loop hierarchy. Only loops that are perfect and almost-perfect nested loops (after possible preliminary function inline or loop unrolling) can be flattened in this manner:

Perfect loop nests
  • The body of each non-innermost loop contains one and only one subloop, and no other instructions.
Almost-perfect loop nests
  • The body of each non-innermost loop contains one and only one subloop, and no other control flow.
  • The body of each non-innermost loop must contain no function call containing a loop.

For almost-perfect loops, the compiler "pushes" automatically into the innermost loop any instructions that exist between the two loops so that the loops are perfectly nested.

In addition, some flattenability requirements are needed:

  • Each loop should be a for-loop, not a while-loop, and without break statements.
  • The tripcount of each loop should be computable by the compiler before the loops to be flattened (but it does not need to be a numerical constant). A typical coding style is:
    • A loop with a loop counter incremented by a numerical constant.
    • A lower bound and upper bound for the loop counter that do not depend on the loops to be flattened (they should be "loop-invariant").

An example of non flattenability is a loop whose inner tripcount depends on the outer loop counter.

Imperfect loop nests (that is, when loops contain more than one subloop or control flow) cannot be flattened by the compiler. In this case, flattening needs to be done by hand by restructuring the code, pushing instructions in the innermost loop, or unrolling inner loops to create a perfect loop nest above.

Syntax

set_directive_loop_flatten [OPTIONS] <location>
  • <location> is the location (inner-most loop), in the format function[/label].

Options

-off or off=true

Optional keyword. Prevents flattening the loop that contains loop_flatten off with its subloops (if any).

Important: The presence of the LOOP_FLATTEN pragma or directive enables the optimization. The addition of -off disables it.

Examples

Flattens loop_1 in function foo and all (perfect or almost-perfect) loops above it in the loop hierarchy, into a single loop. Place the pragma in the body of loop_1.

set_directive_loop_flatten foo/loop_1

Prevents loop flattening in loop_2 of function foo. Place the pragma in the body of loop_2.

set_directive_loop_flatten -off foo/loop_2

For more complete examples, refer to the corresponding pragma HLS loop_flatten section.