This section outlines the various optimization techniques you can use to direct AMD Vitis™ HLS to produce a micro-architecture that satisfies the desired performance and area goals. Using Vitis HLS, you can apply different optimization directives to the design, including:
- Pipelining tasks, allowing the next execution of the task to begin before the current execution is complete.
- Specifying a target latency for the completion of functions, loops, and regions.
- Specifying a limit on the number of resources used.
- Overriding the inherent or implied dependencies in the code to permit specific operations. For example, if it is acceptable to discard or ignore the initial data values, such as in a video stream, allow a memory read before write if it results in better performance.
- Specifying the I/O protocol to ensure function arguments can be
connected to other hardware blocks with the same I/O protocol.Note: Vitis HLS automatically determines the I/O protocol used by any sub-functions. You cannot control these ports except to specify whether the port is registered.
It helps to understand the process used to synthesize RTL hardware description from C/C++ source code. The Understanding High-Level Synthesis Scheduling and Binding describes some of the important details of this process to help you better understand how you can optimize for it.
You can add optimization directives directly into the source code as compiler pragmas using various HLS pragmas, or you can use directives as configuration commands as discussed in Adding Pragmas and Directives. The following table lists the optimization directives provided by the HLS compiler as either pragma or configuration commands.
Directive | Description |
---|---|
AGGREGATE | The AGGREGATE pragma is used for grouping all the elements of a struct into a single wide vector to allow all members of the struct to be read and written to simultaneously. |
ALIAS | The ALIAS pragma enables data dependence analysis in Vitis HLS by defining the distance between multiple pointers accessing the same DRAM buffer. |
ALLOCATION | Specify a limit for the number of operations, implementations, or functions used. This can force the sharing or hardware resources and may increase latency. |
ARRAY PARTITION | Partitions large arrays into multiple smaller arrays or into individual registers, to improve access to data and remove block RAM bottlenecks. |
ARRAY_RESHAPE | Reshape an array from one with many elements to one with greater word-width. Useful for improving block RAM accesses without using more block RAM. |
BIND_OP |
Define a specific implementation for an operation in the RTL. |
BIND_STORAGE |
Define a specific implementation for a storage element, or memory, in the RTL. |
DATAFLOW | Enables task level pipelining, allowing functions and loops to execute concurrently. Used to optimize throughput and/or latency. |
DEPENDENCE | Used to provide additional information that can overcome loop-carried dependencies and allow loops to be pipelined (or pipelined with lower intervals). |
DISAGGREGATE | Break a struct down into its individual elements. |
EXPRESSION_BALANCE | Allows automatic expression balancing to be turned off. |
INLINE | Inlines a function, removing function hierarchy at this level. Used to enable logic optimization across function boundaries and improve latency/interval by reducing function call overhead. |
INTERFACE | Specifies how RTL ports are created from the function description. |
LATENCY | Allows a minimum and maximum latency constraint to be specified. |
LOOP_FLATTEN | Allows nested loops to be collapsed into a single loop with improved latency. |
LOOP_MERGE | Merge consecutive loops to reduce overall latency, increase sharing and improve logic optimization. |
LOOP_TRIPCOUNT | Used for loops which have variables bounds. Provides an estimate for the loop iteration count. This has no impact on synthesis, only on reporting. |
OCCURRENCE | Used when pipelining functions or loops, to specify that the code in a location is executed at a lesser rate than the code in the enclosing function or loop. |
PERFORMANCE | Specify the desired transaction interval for a loop and let the tool to determine the best way to achieve the result. |
PIPELINE | Reduces the initiation interval by allowing the overlapped execution of operations within a loop or function. |
PROTOCOL | This commands specifies a region of code, a protocol region, in which no clock operations will be inserted by Vitis HLS unless explicitly specified in the code. |
RESET | This directive is used to add or remove reset on a specific state variable (global or static). |
STABLE | Indicates that a variable input or output of a dataflow region can be ignored when generating the synchronizations at entry and exit of the dataflow region. |
STREAM | Specifies that a specific array is to be implemented as a FIFO or RAM memory channel during dataflow optimization. When using hls::stream, the STREAM optimization directive is used to override the configuration of the hls::stream. |
TOP | The top-level function for synthesis is specified in the project settings. This directive may be used to specify any function as the top-level for synthesis. This then allows different solutions within the same project to be specified as the top-level function for synthesis without needing to create a new project. |
UNROLL | Unroll for-loops to create multiple instances of the loop body and its instructions that can then be scheduled independently. |
In addition to the optimization directives, Vitis HLS provides a number of configuration commands that can influence the performance of synthesis results. Details on using configurations commands can be found in HLS Config File Commands. The following table reflects some of these commands.
GUI Directive | Description |
---|---|
Array Partition Configuration | Determines how arrays are partitioned, including global arrays and if the partitioning impacts array ports. |
Compile Options | Controls synthesis specific optimizations such as the automatic loop pipelining and floating point math optimizations. |
Dataflow Configuration | Specifies the default memory channel and FIFO depth in dataflow optimization. |
Interface Configuration | Controls I/O ports not associated with the top-level function arguments and allows unused ports to be eliminated from the final RTL. |
Operator Configuration | Configures the default latency and implementation of specified operations. |
RTL Configuration | Provides control over the output RTL including file and module naming, and reset controls. |
Schedule Setting | Determines the effort level to use during the synthesis scheduling phase and the verbosity of the output messages |
Storage Configuration | Configures the default latency and implementation of specified storage types. |
Unroll Setting | Configures the default tripcount threshold for unrolling loops. |