Description
Allows nested loops to be flattened into a single loop hierarchy with improved latency.
In the RTL implementation, it requires one clock cycle to move from an outer loop to an inner loop, and from an inner loop to an outer loop. Flattening nested loops allows them to be optimized as a single loop. This saves clock cycles, potentially allowing for greater optimization of the loop body logic.
Apply the LOOP_FLATTEN pragma to the loop body of the inner-most loop in the loop hierarchy. Only perfect and semi-perfect loops can be flattened in this manner:
- Perfect loop nests
-
- Only the innermost loop has loop body content.
- There is no logic specified between the loop statements.
- All loop bounds are constant.
- Semi-perfect loop nests
-
- Only the innermost loop has loop body content.
- There is no logic specified between the loop statements.
- The outermost loop bound can be a variable.
- Imperfect loop nests
- When the inner loop has variable bounds (or the loop body is not exclusively inside the inner loop), try to restructure the code, or unroll the loops in the loop body to create a perfect loop nest.
Syntax
Place the pragma in the C source within the boundaries of the nested loop.
#pragma HLS loop_flatten off
Where:
-
off
- Optional keyword. Prevents flattening from taking place, and can
prevent some loops from being flattened while all others in the specified location are
flattened. Important: The presence of the LOOP_FLATTEN pragma or directive enables the optimization. The addition of
off
disables it.
Example 1
Flattens loop_1
in function foo
and all (perfect or semi-perfect) loops above it in the loop
hierarchy, into a single loop. Place the pragma in the body of loop_1
.
void foo (num_samples, ...) {
int i;
...
loop_1: for(i=0;i< num_samples;i++) {
#pragma HLS loop_flatten
...
result = a + b;
}
}
Example 2
Prevents loop flattening in loop_1
.
loop_1: for(i=0;i< num_samples;i++) {
#pragma HLS loop_flatten off
...