Optimizing Techniques and Troubleshooting Tips - 2023.2 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
Release Date
2023.2 English

This section outlines the various optimization techniques you can use to direct AMD Vitis™ HLS to produce a micro-architecture that satisfies the desired performance and area goals. Using Vitis HLS, you can apply different optimization directives to the design, including:

  • Pipelining tasks, allowing the next execution of the task to begin before the current execution is complete.
  • Specifying a target latency for the completion of functions, loops, and regions.
  • Specifying a limit on the number of resources used.
  • Overriding the inherent or implied dependencies in the code to permit specific operations. For example, if it is acceptable to discard or ignore the initial data values, such as in a video stream, allow a memory read before write if it results in better performance.
  • Specifying the I/O protocol to ensure function arguments can be connected to other hardware blocks with the same I/O protocol.
    Note: Vitis HLS automatically determines the I/O protocol used by any sub-functions. You cannot control these ports except to specify whether the port is registered.

It helps to understand the process used to synthesize RTL hardware description from C/C++ source code. The Understanding High-Level Synthesis Scheduling and Binding describes some of the important details of this process to help you better understand how you can optimize for it.

You can add optimization directives directly into the source code as compiler pragmas using various HLS pragmas, or you can use directives as configuration commands as discussed in Adding Pragmas and Directives. The following table lists the optimization directives provided by the HLS compiler as either pragma or configuration commands.

Table 1. Vitis HLS Optimization Directives
Directive Description
AGGREGATE The AGGREGATE pragma is used for grouping all the elements of a struct into a single wide vector to allow all members of the struct to be read and written to simultaneously.
ALIAS The ALIAS pragma enables data dependence analysis in Vitis HLS by defining the distance between multiple pointers accessing the same DRAM buffer.
ALLOCATION Specify a limit for the number of operations, implementations, or functions used. This can force the sharing or hardware resources and may increase latency.
ARRAY PARTITION Partitions large arrays into multiple smaller arrays or into individual registers, to improve access to data and remove block RAM bottlenecks.
ARRAY_RESHAPE Reshape an array from one with many elements to one with greater word-width. Useful for improving block RAM accesses without using more block RAM.

Define a specific implementation for an operation in the RTL.


Define a specific implementation for a storage element, or memory, in the RTL.

DATAFLOW Enables task level pipelining, allowing functions and loops to execute concurrently. Used to optimize throughput and/or latency.
DEPENDENCE Used to provide additional information that can overcome loop-carried dependencies and allow loops to be pipelined (or pipelined with lower intervals).
DISAGGREGATE Break a struct down into its individual elements.
EXPRESSION_BALANCE Allows automatic expression balancing to be turned off.
INLINE Inlines a function, removing function hierarchy at this level. Used to enable logic optimization across function boundaries and improve latency/interval by reducing function call overhead.
INTERFACE Specifies how RTL ports are created from the function description.
LATENCY Allows a minimum and maximum latency constraint to be specified.
LOOP_FLATTEN Allows nested loops to be collapsed into a single loop with improved latency.
LOOP_MERGE Merge consecutive loops to reduce overall latency, increase sharing and improve logic optimization.
LOOP_TRIPCOUNT Used for loops which have variables bounds. Provides an estimate for the loop iteration count. This has no impact on synthesis, only on reporting.
OCCURRENCE Used when pipelining functions or loops, to specify that the code in a location is executed at a lesser rate than the code in the enclosing function or loop.
PERFORMANCE Specify the desired transaction interval for a loop and let the tool to determine the best way to achieve the result.
PIPELINE Reduces the initiation interval by allowing the overlapped execution of operations within a loop or function.
PROTOCOL This commands specifies a region of code, a protocol region, in which no clock operations will be inserted by Vitis HLS unless explicitly specified in the code.
RESET This directive is used to add or remove reset on a specific state variable (global or static).
STABLE Indicates that a variable input or output of a dataflow region can be ignored when generating the synchronizations at entry and exit of the dataflow region.
STREAM Specifies that a specific array is to be implemented as a FIFO or RAM memory channel during dataflow optimization. When using hls::stream, the STREAM optimization directive is used to override the configuration of the hls::stream.
TOP The top-level function for synthesis is specified in the project settings. This directive may be used to specify any function as the top-level for synthesis. This then allows different solutions within the same project to be specified as the top-level function for synthesis without needing to create a new project.
UNROLL Unroll for-loops to create multiple instances of the loop body and its instructions that can then be scheduled independently.

In addition to the optimization directives, Vitis HLS provides a number of configuration commands that can influence the performance of synthesis results. Details on using configurations commands can be found in HLS Config File Commands. The following table reflects some of these commands.

Table 2. Vitis HLS Configurations
GUI Directive Description
Array Partition Configuration Determines how arrays are partitioned, including global arrays and if the partitioning impacts array ports.
Compile Options Controls synthesis specific optimizations such as the automatic loop pipelining and floating point math optimizations.
Dataflow Configuration Specifies the default memory channel and FIFO depth in dataflow optimization.
Interface Configuration Controls I/O ports not associated with the top-level function arguments and allows unused ports to be eliminated from the final RTL.
Operator Configuration Configures the default latency and implementation of specified operations.
RTL Configuration Provides control over the output RTL including file and module naming, and reset controls.
Schedule Setting Determines the effort level to use during the synthesis scheduling phase and the verbosity of the output messages
Storage Configuration Configures the default latency and implementation of specified storage types.
Unroll Setting Configures the default tripcount threshold for unrolling loops.