Vitis HLS schedules logic and functions early as possible to reduce latency while keeping the estimated clock period below the user-specified period. To perform this, it schedules as many logic operations and functions as possible in parallel. It does not schedule loops to execute in parallel.
If the following code example is synthesized, loop SUM_X
is scheduled and then loop SUM_Y
is scheduled: even though loop SUM_Y
does not need to wait for loop SUM_X
to complete before it can begin its operation, it is scheduled after SUM_X
.
#include "loop_sequential.h"
void loop_sequential(din_t A[N], din_t B[N], dout_t X[N], dout_t Y[N],
dsel_t xlimit, dsel_t ylimit) {
dout_t X_accum=0;
dout_t Y_accum=0;
int i,j;
SUM_X:for (i=0;i<xlimit; i++) {
X_accum += A[i];
X[i] = X_accum;
}
SUM_Y:for (i=0;i<ylimit; i++) {
Y_accum += B[i];
Y[i] = Y_accum;
}
}
Because the loops have different bounds (xlimit
and ylimit
), they cannot be merged. By placing the
loops in separate functions, as shown in the following code example, the identical
functionality can be achieved and both loops (inside the functions) can be scheduled
in parallel.
#include "loop_functions.h"
void sub_func(din_t I[N], dout_t O[N], dsel_t limit) {
int i;
dout_t accum=0;
SUM:for (i=0;i<limit; i++) {
accum += I[i];
O[i] = accum;
}
}
void loop_functions(din_t A[N], din_t B[N], dout_t X[N], dout_t Y[N],
dsel_t xlimit, dsel_t ylimit) {
sub_func(A,X,xlimit);
sub_func(B,Y,ylimit);
}
If the previous example is synthesized, the latency is half the latency of the sequential loops example because the loops (as functions) can now execute in parallel.
The dataflow
optimization could also be used in
the sequential loops example. The principle of capturing loops in functions to exploit
parallelism is presented here for cases in which dataflow
optimization cannot be used. For example, in a larger example, dataflow
optimization is applied to all loops and functions at the top-level and
memories placed between every top-level loop and function.