vadd: for(int i = 0; i < 20; i++) {
#pragma HLS UNROLL
c[i] = a[i] + b[i];
}
In the preceding example, you can see pragma HLS UNROLL
has been inserted into the body of the loop to instruct the compiler to unroll the loop completely. All 20 iterations of the loop are executed in parallel if that is permitted by any data dependency.
Partially Unrolled Loop
To completely unroll a loop, the loop must have a constant bound (20 in the example above). However, partial unrolling is possible for loops with a variable bound. A partially unrolled loop means that only a certain number of loop iterations can be executed in parallel.
array_sum:for(int i=0;i<4;i++){
#pragma HLS UNROLL factor=2
sum += arr[i];
}
In the above example the UNROLL
pragma is given a factor of 2. This is the equivalent of manually duplicating the loop body and running the two loops concurrently for half as many iterations. The following code shows how this would be written. This transformation allows two iterations of the above loop to execute in parallel.
array_sum_unrolled:for(int i=0;i<4;i+=2){
// Manual unroll by a factor 2
sum += arr[i];
sum += arr[i+1];
}
Just like data dependencies inside a loop impact the initiation interval of a pipelined loop, an unrolled loop performs operations in parallel only if data dependencies allow it. If operations in one iteration of the loop require the result from a previous iteration, they cannot execute in parallel, but execute as soon as the data from one iteration is available to the next.
PIPELINE
loops first, and then UNROLL
loops with small loop bodies and limited iterations to improve performance further.