Version0 - 2025.1 English - XD261

Vitis Tutorials: Vitis HLS (XD261)

Document ID
XD261
Release Date
2025-06-17
Version
2025.1 English

The component Version0 is a golden standard which has not been refactored. This is our baseline version. Let’s use it to learn more about the code structure and see what Code Analyzer can show us for this design.

Let’s begin by simulating the Version0 code.

  1. Select Run under C Simulation in the Flow pane.

    After this, a standard off the shelf C compiler will run, confirming that the C test bench compiles and runs. Monitor the output to ensure this completes without encountering any errors. Look for the line C-simulation finished successfully. You will also see a green check mark icon appear next to the Run command, confirming the C Simulation was successful. If a red X icon is shown, check the output for the source of the error.

    The next thing we want to do is enable the HLS Code Analyzer. The Code Analyzer is enabled from the HLS Component Settings window, so navigate to that window. Then, scroll down until you find the code_analyzer option.

  2. In the Vitis Components pane, expand Version0, then expand Settings and select hls_config.cfg

  3. On the right hand side of the Settings window that appears, select C Simulation. Then, check the box next to Enable Code Analyzer.

    Enable Code Analyzer

    Now that the Code Analyzer is enabled, it will run when C simulation is run. When it completes, the Code Analyzer view will be enabled for selection under the C simulation reports.

  4. Select Run under C Simulation in the Flow pane again.

  5. When it completes, expand the Reports dropdown in the Flow pane and click Code Analyzer.

    The Code Analyzer graph will appear and show the top level hardware functions for synthesis, but here the top function polyvec_ntt only contains one extracted process called polyvec_ntt_loop.

    version0_polyvec_ntt

    The view for this top level function is not very interesting as there is only one extracted process; the transaction interval of this process is shown on the right hand-side of the process.

    The estimated transaction interval of the top level function is 51.5 millions clock cycles.

    You can expand the dropdown arrow on the left side of the process to reveal the contents of the process: it contains one loop labelled polyvec_ntt_loop (which became the process name) that calls a function named poly_ntt.

    version0_polyvec_ntt

    Let’s keep drilling into the subfunction poly_ntt:

    • Use the arrows on the left of the expanded process window, or

    • Use the function dropdown at the top of the Code Analyzer window, and select the function poly_ntt.

    Using either methods, we will again see little information, as poly_ntt is only calling ntt.

    Note: The screenshot for the poly_ntt function is not shown.

    The function ntt is where the bulk of the Number Theoretic Transform code is contained. Let’s see what is inside.

  6. Open the function ntt and wait for HLS to analyze this code.

Upon selecting ntt, the HLS tool will process and display a detailed graph representing the function’s internal operations and their interdependencies:

Code Analyzer view of ntt

The graph illustrates the single processes that comprise the ntt function and we can see the 3 nested loops and the dependencies therein in the channel table.

In our function ntt, the loop body of the nested loops implement the different sequential stages of the algorithm.

The estimated transaction interval of the ntt function is 402 thousands clock cycles. Let’s record this metric in a table:

Version polyvec_ntt TI ntt TI
Version0: baseline code 51.5M 402k

Furthermore, at the bottom of the screen, by selecting “Channels” you can view an additional aspect of the analysis:

Code Analyzer view of ntt's channels

The “Channels” pane lists every data transaction that was analyzed. A variable might show up in the list twice if multiple transactions occurred on that variable. The chart also provides data on each transaction, like the bitwidth, throughput, and total volume of data being passed.

It is clear that the folded triple-nested-loop contains the bulk of the computations and that in order to extract parallelism we need to unroll the outer loop.

As a first attempt to extract parallelism, we have created Version1 based on the same code and testbench as Version0; now let’s switch to this new component.