Using Code Analyzer - 2024.2 English - UG1399

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2024-11-13
Version
2024.2 English

To achieve the best results in high-level synthesis (HLS) code changes are often required to improve the macro architecture of the design. To assist you with this effort, Vitis HLS Code Analyzer provides features which let you visualize the potential for task level parallelism and understand the architectural changes needed to optimize performance.

You can run Code Analyzer as part of the C Simulation step by enabling the csim.code_analyzer command in the HLS configuration file as described in C-Simulation Configuration. After running it, the Code Analyzer report becomes available under the Reports of the C Simulation step in the Flow Navigator, or in the Analysis view of the Vitis unified IDE. The features of Code Analyzer include:

Dataflow Graph Extraction
The Code Analyzer report extracts a dataflow graph (DFG) in which top-level statements become dataflow processes (DFG nodes), and the data dependencies of these processes become dataflow channels (DFG edges). The graph can be generated from a function or loop body even when they are not dataflow, helping you better determine how the code might be rewritten in a dataflow form, as described in Abstract Parallel Programming Model for HLS.
Performance Metrics
Performance metrics such as the volume of data, transaction intervals (TI), and throughput are determined by Code Analyzer. The volume and access mode of data can be determined from the C test bench based on profiling information. For example, the tool can determine that variable A infers a port 32 bits wide, and 8 Kb of data is written by process 1 before being read by process 2. Analyzing the design prior to synthesis, Code Analyzer estimates the transaction interval, or time it takes for data transfer to complete, and the throughput of the channel. However, because the estimate occurs prior to synthesis it is less accurate than when calculated from the performance of the synthesized RTL.
Performance Guidance
Identify any major performance blockers the HLS component source code might have. These blockers include cyclic dependencies and memory port contention. Performance Guidance helps you understand code structures that might limit design performance, or identify which metrics can help you understand the level of performance your design can achieve.
Graph Transformations
Based on the dataflow graph decomposition and the measured and estimated metrics, you might determine that the graph is not ideal as shown. You can modify the graph by merging processes to perform what-if type design exploration. When the graph is modified, new performance metrics will be determined from the new architecture. Iterations of this design process could result in a blueprint for an ideally architected solution which you can use as the basis for refactoring your source code.
Important: The code can be merged and split in the Code Analyzer report, but any changes you want to carry forward will need to be manually reimplemented in the original design source code.

Using the Code Analyzer Report

After running the C Simulation command with Code Analyzer enabled, the Code Analyzer report is generated and available to view in the Vitis unified IDE either in the Analysis view, or under the Reports header in the Flow Navigator. The Code Analyzer report initially displays the graph of the processes and channels defined by the top-level function of the component, as shown in the example below. You can change the scope of the report by using the Function selector in the toolbar menu, or by clicking the right arrow in an expanded process in the graph.

Figure 1. Code Analyzer Report

Features of the Code Analyzer report include the following:

Graph
The Graph view in the report shows the processes and channels of the design in a dataflow graph. This infers the presence of the DATAFLOW pragma or directive, even if it does not yet exist in the source code. Each process shows the Transaction Interval and Performance Guidance for the element, and includes the source code for that element which can be viewed by expanding the Code view. In the example below you can see the performance as estimated during the pre-synthesis analysis. The dataflow processes (the graph nodes) have their TI displayed in the yellow/red boxes on the top right.
The function calls and loops in the code that form the dataflow processes also have some performance metrics shown right after the call or loop header and they could have a Details link with additional guidance which identifies blocking factors for the performance of this loop or function call.
Figure 2. Graph View

In addition, you can right-click in the process header and select the Goto Source command to open and highlight the source code in its source file.
Table
Beneath the graph the Code Analyzer report displays a table with two tabs: Processes and Channels. It provides a quick summary of the different elements so you can review the analysis in one table.
  • Processes displays the processes of the graph, and also includes the estimated pre-synthesis Transaction Interval (TI), and provides any design Guidance that the analysis might generate.
  • Channels displays the dataflow channels going into and out of each Process in a separate row. Channels are named after the variable defining them, with details of the variable declaration such as bitwidth, the data volume expected to be delivered over the channel, expected throughput, the access mode, and the Producer and Consumer tasks or processes.
Toolbar
The toolbar menu of the Code Analyzer report provides a number of commands to help configure and view the reports.

The preceding figure displays the following commands starting from the left:
  • Zoom In/Zoom Out/Zoom Fit: Zoom into the graph diagram as needed.
  • Toggle Table: Displays or hides the table of Processes and Channels. This can free up space for the graph if needed.
  • Collapse All: Closes any expanded processes in the graph.
  • Group All/Ungroup All: Groups or ungroups channels with the same source and destination.
  • Function: Provides the context for the current graph. The context can be changed by selecting a new function from the list, or by clicking on the arrows next to loops and function calls in the code of processes.
  • Heat Map: Specifies the reported data for the graph as either the Transaction Interval (TI) for loops or processes, or Performance Guidance messages. In the case of TI the largest result is highlighted in red to indicate that these are the slowest processes, those limiting the performance of the overall dataflow region.
  • Properties: Shows or hides the panel displaying the performance bottlenecks. The panel content is set by clicking on a Details link in the code of a process.
  • Info: Provides information related to the use of the tool, returned metrics, or common caveats of design. Worth looking through from time to time.
  • Settings: Specify the throughput units and edge labels in the graph. The available units are (Bits or Bytes) per (Cycle or Second). You can also specify a lower-limit of data volume by setting the Channel Volume Filter to grey out lower data signals (control signals for instance) and let you focus on the high-data channels.
Overview
The Overview is a miniature representation of the whole graph that provides a reference to the portion of the graph that is displayed when zoomed into the graph. You can use the Overview to manage the view of the graph by manipulating the boundary. You can also close the Overview to free up space on the graph if desired.

Working with the Graph

When starting the design you need to understand the HLS component source code in depth, identifying the main processes and the dependencies between these processes. Code Analyzer supports this by showing the your code as a dataflow graph as an output of C simulation.

The Code Analyzer report displays the Transaction Interval (TI) for each process, and displays the highest TI using a red background in the heat map. The red indicates the problem areas of your design. However, the pre-synthesis estimates used in the graph do not offer the same fidelity as the post-synthesis or post-implementation metrics. Code Analyzer lets you quickly determine performance potential, identify issues, and resolve them. Synthesis and implementation should be used when more precise information is needed.

You can merge consecutive processes in the source code to explore different dataflow structures of your design, and then split the code back into separate processes as needed. Simply drag and drop one process onto the second process to merge them. Remember, the processes must be sequential in the original source code. The following figure shows two processes that were merged, and now can be split apart again by clicking on the SPLIT line in the code.

Figure 3. Working with the Graph

If you identify a large bottleneck in the current design, and would like to turn it in a dataflow region, you can refocus the graph to the function or loop body and continue your analysis of the design. Work to resolve the issues in a methodical manner to make the best use of the Code Analyzer.

Ultimately the code must be rewritten in a dataflow form to reflect the results of Code Analyzer. Typically you need to outline the processes in their own functions and add a dataflow pragma in the function, loop, or region. This process can be accelerated by clicking on Goto Source on every process in the graph, right-click on the selected source code, and select Refactor.

Use Cases

Legality

Code Analyzer allows for legality issues, as described in Canonical Body and Canonical Forms, to be identified on dataflow designs before synthesis is attempted. The key issues that can be identified are:

  • Read and written interfaces can be found through "R+W" accesses on the channels originated in the Start node or destined to the End node.
  • Multiple producers/consumer violations can be identified from the table, sorting by channel name and identifying multiple channels with the same variable. Accesses to the Start and End nodes can generally be dismissed.
  • Feedback loops can be found in the table with channel accesses of the mode "R → W" or, potentially, "R+W → W." This analysis can be complemented by the type of the channel to distinguish legal from illegal feedback channels.
  • Non-outlined processes can be identified from the process codes: users should aim at having a single function call per process in their top-level dataflow region and, as much as possible, have this call use variables or constants as arguments.

These issues can be fixed directly in the code and a new run of C Simulation will then refresh the graph with updated metrics and structure.

Improve Performance

One of they key component to the performance of a dataflow region is the TI of the processes that constitute the region. Code analyzer can be used to efficiently improve the performance of dataflow processes without HLS synthesis.

Select Performance Guidance in the Heat Map selector of the toolbar menu, and you can identify processes with a performance issue using the issues badge present on graph nodes. Expanding the process code presents the details of the particular problems identified in processes. You can investigate these issues and decide to address them or not depending on the feasibility, location of these problems, and ultimate performance objective. For instance, if you want II=1 in the inner loop of some specific processes, you will need to rewrite your code to fix all the problems presented on that particular loop nest.

In a related use case, you might want to understand how the TI was computed for particular process, be it for educational or verification purposes. The TI and II annotation next to function calls and loops can be explored inlined with the process source code for this purpose.

Throughput Analysis

Code Analyzer presents estimates of throughput on channels. To complement the analysis and better understand the design performance, you can also access the channel width and its volume (total number of accesses per execution of the region). However you should validate the throughput estimates by synthesizing the design when possible. Code analyzer relies on pre-synthesis estimates that have a lower fidelity than other post-synthesis and post-implementation metrics.