void adder(unsigned int *in, unsigned int *out, int inc, int size) {
unsigned int in_internal[MAX_SIZE];
unsigned int out_internal[MAX_SIZE];
mem_rd: for (int i = 0 ; i < size ; i++){
#pragma HLS PIPELINE
// Reading from the input vector "in" and saving to internal variable
in_internal[i] = in[i];
}
compute: for (int i=0; i<size; i++) {
#pragma HLS PIPELINE
out_internal[i] = in_internal[i] + inc;
}
mem_wr: for(int i=0; i<size; i++) {
#pragma HLS PIPELINE
out[i] = out_internal[i];
}
}
In the previous example, three sequential loops are shown: mem_rd
, compute
, and mem_wr
.
- The
mem_rd
loop reads input vector data from the memory interface and stores it in internal storage. - The main
compute
loop reads from the internal storage and performs an increment operation and saves the result to another internal storage. - The
mem_wr
loop writes the data back to memory from the internal storage.
This code example is using two separate loops for reading and writing from/to the memory input/output interfaces to infer burst read/write.
By default, these loops are executed sequentially without any overlap. First, the mem_rd
loop finishes reading all the input data before the compute
loop starts its operation. Similarly, the compute
loop finishes processing the data before the mem_wr
loop starts to write the data. However, the execution of these loops can be overlapped, allowing the compute
(or mem_wr
) loop to start as soon as there is enough data available to feed its operation, before the mem_rd
(or compute
) loop has finished processing its data.
The loop execution can be overlapped using dataflow optimization as described in Dataflow Optimization.