pragma HLS performance - 2025.1 English - UG1399

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2025-05-29
Version
2025.1 English

Description

Tip: The top-level PERFORMANCE pragma applies globally to the entire Design Under Test (DUT) top function, whereas the loop-level PERFORMANCE pragma targets specific loops or loop nests within the design.

The PERFORMANCE pragma in Vitis HLS provides a mechanism for defining high-level performance goals for your design, guiding the synthesis tool's optimization efforts. It can be applied at two distinct scopes: the top-level of the design (Kernel/IP) and specific loop levels within the design.

Top-Level PERFORMANCE Pragma

Applying the PERFORMANCE pragma at the top-level function of your Kernel or IP allows you to specify an overall performance target, primarily focusing on achieving a desired throughput. When applied at the kernel's top-level function, this pragma triggers the HLS tool to conduct a design-wide performance analysis. This analysis evaluates the entire design structure to determine the feasibility of meeting the specified performance goal. Based on the overall goal and this analysis, the tool automatically infers appropriate Loop-Level performance pragma for individual loops and loop nest within the design hierarchy.

Loop-Level PERFORMANCE Pragma

Applying the PERFORMANCE pragma directly to a specific loop or loop nest lets you define performance goals for that particular section of the loop or loop nest. Loop-Level PERFORMANCE Pragma or directive lets you specify a high-level constraint, target_ti defining the number of clock cycles between successive starts of a loop, and lets the tool infer lower-level UNROLL, PIPELINE, ARRAY_PARTITION, and INLINE directives needed to achieve the desired result. The PERFORMANCE pragma or directive does not guarantee that the specified value is achieved, and so it is only a target

The PERFORMANCE pragma, applicable at both the top-level (entire function/DUT) and loop level, uses parameters target_ti to guide optimization toward specific performance goals.

target_ti (Target Initiation Interval): This defines the desired number of clock cycles between successive starts of the DUT or loop.

Top-Level Performance Pragma:

  • When applied at the top-level, target_ti specifies the target interval between the start of one function execution and the start of the next. Let's consider an example to calculate the Top-Level performance pragma for an image processing design targeting a frame rate of 60 frames per second (FPS). To achieve a 60 FPS frame rate, the function/DUT must be ready to restart and read a new frame within 1/60th of a second.
  • Target TI = 1/FPS = 1/60 ~= 16.67 milliseconds.
  • This means that the function/DUT must be ready to restart and read a new frame within 16.7 milliseconds to maintain the 60 frames per second (FPS) target.

Loop-Level Performance Pragma:

  • When applied at the loop-level, target_ti specifies the target interval between the start of the first iteration of the loop in successive outer loop iterations . In the following example, target_ti=T on loop L2 targets an interval of T cycles between the start of loop L2 execution for one iteration of L1 and the start of loop L2 for the subsequent iteration of L1.
  • 1 const int T = 100; 
    2 L1: for (int i=0; i<N; i++)  
    3 L2: for (int j=0; j<M; j++)
    4 { pragma HLS performance target_ti=T   ...   }

Syntax

#pragma HLS performance target_ti=<value>

Where:

target_ti=<value>
Specifies a target transaction interval defined as the number of clock cycles for the function, loop, or region of code to complete an iteration. The <value> can be specified as an integer, floating point, or constant expression that is resolved by the tool as an integer.
Note: A warning is returned if truncation occurs.
unit=[sec | cycle]
Specifies the unit associated with the target_ti or target_tl values. The unit can either be specified as seconds, or clock cycles. When the unit is specified as seconds, a unit can be specified with the value to indicate nanoseconds (ns), picoseconds (ps), microseconds (us).

Example 1

If a value of 100us for target_ti is desired, how should the pragma be specified? For example, if a value is 100us, target_ti=1e-4 unit=sec.

  • #pragma HLS performance target_ti=100 unit=us
  • #pragma HLS performance target_ti=100us unit=sec
  • #pragma HLS performance target_ti=100e-6 unit=sec

Example 2

The outer loop is specified to have target transaction interval of 1000 clock cycles.

  for (int i =0; i < 1000; ++i) {
#pragma HLS performance target_ti=1000
    for (int j = 0; j < 8; ++j) {
      int tmp = b_buf[j].read();
      b[i * 8 + j] = tmp + 2;
    }
  }