The component Version0
is a golden standard which has not been refactored.
This is our baseline version.
Let’s use it to learn more about the code structure and see what Code Analyzer can show us for this design.
Let’s begin by simulating the Version0
code.
Select Run under C Simulation in the Flow pane.
After this, a standard off the shelf C compiler will run, confirming that the C test bench compiles and runs. Monitor the output to ensure this completes without encountering any errors. Look for the line
C-simulation finished successfully
. You will also see a green check mark icon appear next to the Run command, confirming the C Simulation was successful. If a red X icon is shown, check the output for the source of the error.The next thing we want to do is enable the HLS Code Analyzer. The Code Analyzer is enabled from the HLS Component Settings window, so navigate to that window. Then, scroll down until you find the
code_analyzer
option.In the Vitis Components pane, expand
Version0
, then expand Settings and selecthls_config.cfg
On the right hand side of the Settings window that appears, select C Simulation. Then, check the box next to Enable Code Analyzer.
Now that the Code Analyzer is enabled, it will run when C simulation is run. When it completes, the Code Analyzer view will be enabled for selection under the C simulation reports.
Select Run under C Simulation in the Flow pane again.
When it completes, expand the Reports dropdown in the Flow pane and click Code Analyzer.
The Code Analyzer graph will appear and show the top level hardware functions for synthesis, but here the top function
polyvec_ntt
only contains one extracted process calledpolyvec_ntt_loop
.The view for this top level function is not very interesting as there is only one extracted process; the transaction interval of this process is shown on the right hand-side of the process.
The estimated transaction interval of the top level function is 51.5 millions clock cycles.
You can expand the dropdown arrow on the left side of the process to reveal the contents of the process: it contains one loop labelled
polyvec_ntt_loop
(which became the process name) that calls a function namedpoly_ntt
.Let’s keep drilling into the subfunction
poly_ntt
:Use the arrows on the left of the expanded process window, or
Use the function dropdown at the top of the Code Analyzer window, and select the function
poly_ntt
.
Using either methods, we will again see little information, as
poly_ntt
is only callingntt
.Note: The screenshot for the
poly_ntt
function is not shown.The function
ntt
is where the bulk of the Number Theoretic Transform code is contained. Let’s see what is inside.Open the function
ntt
and wait for HLS to analyze this code.
Upon selecting ntt
, the HLS tool will process and display a detailed graph representing the function’s internal operations and their interdependencies:
The graph illustrates the single processes that comprise the ntt
function and we can see the 3 nested loops and the dependencies therein in the channel table.
In our function ntt
, the loop body of the nested loops implement the different sequential stages of the algorithm.
The estimated transaction interval of the
ntt
function is 402 thousands clock cycles. Let’s record this metric in a table:
Version | polyvec_ntt TI | ntt TI |
---|---|---|
Version0: baseline code | 51.5M | 402k |
Furthermore, at the bottom of the screen, by selecting “Channels” you can view an additional aspect of the analysis:
The “Channels” pane lists every data transaction that was analyzed. A variable might show up in the list twice if multiple transactions occurred on that variable. The chart also provides data on each transaction, like the bitwidth, throughput, and total volume of data being passed.
It is clear that the folded triple-nested-loop contains the bulk of the computations and that in order to extract parallelism we need to unroll the outer loop.
As a first attempt to extract parallelism, we have created Version1
based on the same code and testbench as Version0
; now let’s switch to this new component.