Some of the optimizations that Vitis HLS can apply are prevented when the loop has variable bounds.
In the following code example, the loop bounds are determined by variable width
, which is driven from a top-level input. In this
case, the loop is considered to have a variable bound, because Vitis HLS cannot know when the loop will complete.
#include "ap_int.h"
#define N 32
typedef ap_int<8> din_t;
typedef ap_int<13> dout_t;
typedef ap_uint<5> dsel_t;
dout_t code028(din_t A[N], dsel_t width) {
dout_t out_accum=0;
dsel_t x;
LOOP_X:for (x=0;x<width; x++) {
out_accum += A[x];
}
return out_accum;
}
Attempting to optimize the design in the example above reveals the
issues created by variable loop bounds. The first issue with variable loop bounds is
that they prevent Vitis HLS from determining the
latency of the loop. Vitis HLS can determine the
latency to complete one iteration of the loop, but because it cannot statically
determine the exact variable width
, it does not know how many
iterations are performed and thus cannot report the loop latency (the number of
cycles to completely execute all iterations of the loop).
When variable loop bounds are present, Vitis HLS reports the latency as a question mark (?
) instead of using exact values. The following shows
the result after synthesis of the previous example.
+ Summary of overall latency (clock cycles):
* Best-case latency: ?
* Worst-case latency: ?
+ Summary of loop latency (clock cycles):
+ LOOP_X:
* Trip count: ?
* Latency: ?
The way to overcome this issue is to use pragma HLS loop_tripcount or set_directive_loop_tripcount.
The tripcount
directive allows a
minimum and/or maximum tripcount
to be specified
for the loop. The tripcount
is the number of loop
iterations. If a maximum tripcount
of 32 is applied
to LOOP_X
in the first example, the report is
updated to the following:
+ Summary of overall latency (clock cycles):
* Best-case latency: 2
* Worst-case latency: 34
+ Summary of loop latency (clock cycles):
+ LOOP_X:
* Trip count: 0 ~ 32
* Latency: 0 ~ 32
The user-provided values for the tripcount
directive are used only for reporting. The tripcount
value allows Vitis HLS to report number in the report, allowing the reports from
different solutions to be compared. To have this same loop-bound information used
for synthesis, the C/C++ code must be updated by using asserts, which impact
synthesis (however, they must be used carefully since the assert condition is
assumed to be true).
The next steps in optimizing the first example for a lower initiation interval are:
- Unroll the loop and allow the accumulations to occur in parallel.
- Partition the array input, or the parallel accumulations are limited by a single memory port.
If these code transformations are applied, the output from Vitis HLS highlights the most significant issue with variable bound loops:
@W [XFORM-503] Cannot unroll loop 'LOOP_X' in function 'code028': cannot completely
unroll a loop with a variable trip count.
Because variable bounds loops cannot be unrolled, they not only prevent the unroll directive from being applied, they also prevent pipelining the levels above the loop.
The solution to loops with variable bounds is to make the number of
loop iteration a fixed value with conditional executions inside the loop. The code
from the variable loop bounds example can be rewritten as shown in the following
code example. Here, the loop bounds are explicitly set to the maximum value of
variable width
and the loop body is conditionally executed:
#include "ap_int.h"
#define N 32
typedef ap_int<8> din_t;
typedef ap_int<13> dout_t;
typedef ap_uint<5> dsel_t;
dout_t loop_max_bounds(din_t A[N], dsel_t width) {
dout_t out_accum=0;
dsel_t x;
LOOP_X:for (x=0; x<N; x++) {
if (x<width) {
out_accum += A[x];
}
}
return out_accum;
}
The for-loop (LOOP_X
) in the example
above can be unrolled. Because the loop has fixed upper bounds, Vitis HLS knows how much hardware to create. There are
N(32)
copies of the loop body in the RTL
design. Each copy of the loop body has conditional logic associated with it and is
executed depending on the value of variable width. Refer to Vitis-HLS-Introductory-Examples/Modeling/variable_bound_loops on Github
forum example.