Description
You can unroll loops to create multiple independent operations rather than a single collection of operations. The UNROLL pragma transforms loops by creating multiples copies of the loop body in the RTL design, which allows some or all loop iterations to occur in parallel.
Loops in the C/C++ functions are kept rolled by default. When loops are
rolled, synthesis creates the logic for one iteration of the loop, and the RTL design
executes this logic for each iteration of the loop in sequence. A loop is executed for the
number of iterations specified by the loop induction variable. The number of iterations
might also be impacted by logic inside the loop body (for example, break
conditions or modifications to a loop exit variable). Using the UNROLL
pragma you can unroll loops to increase data access and throughput.
The UNROLL pragma allows the loop to be fully or partially unrolled. Fully unrolling the loop creates a copy of the loop body in the RTL for each loop iteration, so the entire loop can be run concurrently. Partially unrolling a loop lets you specify a factor N, to create N copies of the loop body and reduce the loop iterations accordingly.
Partial loop unrolling does not require N to be an integer factor of the maximum loop iteration count. The Vitis HLS tool adds an exit check to ensure that partially unrolled loops are functionally identical to the original loop. For example, given the following code:
for(int i = 0; i < X; i++) {
pragma HLS unroll factor=2
a[i] = b[i] + c[i];
}
break
construct is used
to ensure the functionality remains the same, and the loop exits at the appropriate
point.for(int i = 0; i < X; i += 2) {
a[i] = b[i] + c[i];
if (i+1 >= X) break;
a[i+1] = b[i+1] + c[i+1];
}
In the example above, because the maximum iteration count, X
, is a variable, the HLS tool might not be able to determine its
value, so it adds an exit check and control logic to partially unrolled loops. However, if
you know that the specified unrolling factor, 2 in this example, is an integer factor of the
maximum iteration count X
, the skip_exit_check
option lets you remove the exit check and associated logic. This
helps minimize the area and simplify the control logic.
config_unroll
command. Syntax
Place the pragma in the C source within the body of the loop to unroll.
#pragma HLS unroll factor=<N> region skip_exit_check
Where:
-
factor=<N>
- Specifies a non-zero integer indicating that partial unrolling is requested. The loop
body is repeated the specified number of times, and the iteration information is
adjusted accordingly. If
factor=
is not specified, the loop is fully unrolled. -
skip_exit_check
- Optional keyword that applies only if partial unrolling is specified
with
factor=
. The elimination of the exit check is dependent on whether the loop iteration count is known or unknown:-
Fixed bounds
No exit condition check is performed if the iteration count is a multiple of the factor.
If the iteration count is not an integer multiple of the factor, the tool:
- Prevents unrolling.
- Issues a warning that the exit check must be performed to proceed.
-
Variable bounds
The exit condition check is removed. You must ensure that:
- The variable bounds is an integer multiple of the factor.
- No exit check is in fact required.
-
Fixed bounds
Example 1
The following example fully unrolls loop_1
in function foo
. Place the pragma in the body of loop_1
as shown.
loop_1: for(int i = 0; i < N; i++) {
#pragma HLS unroll
a[i] = b[i] + c[i];
}
Example 2
This example specifies an unroll factor of 4 to partially unroll loop_2
of function foo
, and
removes the exit check.
void foo (...) {
int8 array1[M];
int12 array2[N];
...
loop_2: for(i=0;i<M;i++) {
#pragma HLS unroll skip_exit_check factor=4
array1[i] = ...;
array2[i] = ...;
...
}
...
}
Example 3
The following example fully unrolls all loops inside loop_1
in function foo
, but not
loop_1
itself because the presence of the region
keyword.
void foo(int data_in[N], int scale, int data_out1[N], int data_out2[N]) {
int temp1[N];
loop_1: for(int i = 0; i < N; i++) {
#pragma HLS unroll region
temp1[i] = data_in[i] * scale;
loop_2: for(int j = 0; j < N; j++) {
data_out1[j] = temp1[j] * 123;
}
loop_3: for(int k = 0; k < N; k++) {
data_out2[k] = temp1[k] * 456;
}
}
}