Now that we’ve made a couple optimizations to the code, let’s compare the results of those optimizations. If you have been creating new HLS components for each optimization, it’s easy to pull up the performance numbers of each component by referencing the existing synthesis report. Otherwise, you can reference this chart:
Unoptimized | Pipeline L1 | Reshape RX | Reshape Beam | |
---|---|---|---|---|
SLACK (NS) | 0.060 | -0.054 | 0.014 | 0.014 |
INTERVAL (CYCLES) | 3147502 | 40116 | 7626 | 2625 |
II of Loop L1 | N/A | 16 | 3 | 1 |
DSP | 16 | 18 | 96 | 288 |
FF | 2803 | 16274 | 27465 | 47456 |
LUT | 3888 | 8550 | 12551 | 11454 |
It’s easy to tell that each of our optimizations had the desired effect of increasing performance as measured by the Interval of the top level hardware function. This is accomplished by way of removing bottlenecks which restrict the II of Loop L1. Originally, the Loop needs 16 cycles for the I/O to complete. By reshaping RX and Beam, Vitis HLS is able to optimize the II of the loop down to 3 and 1 respectively. Notably, the unroll pragmas on the lower level loops, which affect the complex multiply operation that utilizes the DSP primitives, were unchanged in each case. In fact, what we are seeing here is that after Vitis HLS identifies the II violation, the compiler is able to determine that the interface bandwidth is performance limiting. As such, C Synthesis culls some of the resource intensive loop unrolling because the tool understands that there will be no performance gained by allocating those user requested resources.
We should also note that the interval of the optimized code at 2625 cycles easily meets our initial target requirement of 3000 cycles.