pragma HLS performance - 2024.1 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2024-07-03
Version
2024.1 English

Description

Tip: The PERFORMANCE pragma or directive applies to loops and loop nests, and requires a known loop tripcount to determine the performance. If your loop has a variable tripcount then you must also specify LOOP_TRIPCOUNT pragma or directive.

The PERFORMANCE pragma or directive lets you specify a high-level constraint, target_ti or target_tl, defining the number of clock cycles between successive starts of a loop, and lets the tool infer lower-level UNROLL, PIPELINE, ARRAY_PARTITION, and INLINE directives needed to achieve the desired result. The PERFORMANCE pragma or directive does not guarantee the specified value will be achieved, and so it is only a target.

The target_ti is the interval between successive starts of the loop, or between the start of the first iteration of the loop, and the next start of the first iteration of the loop. In the following code example, a target_ti=T would mean the target interval for the start of loop L2 between two consecutive iterations of L1 should be 100 cycles.

const int T = 100;
L1: for (int i=0; i<N; i++)
  L2: for (int j=0; j<M; j++){
pragma HLS performance target_ti=T
   ...
   }

The target_tl is the interval between start of the loop and end of the loop, or between the start of the first iteration of the loop and the completion of the last iteration of the loop. For example, in the preceding code example a target_tl=T means the target completion of loop L2 for a single iteration of L1 should be 100 cycles.

Note: The INLINE pragma is applied automatically to functions inside any pipelined loop that has II=1 to improve throughput. If you apply the PERFORMANCE pragma or directive that infers a pipeline with II=1, it will also trigger the auto-inline optimization. You can disable this for specific functions by using #pragma HLS INLINE off.

The transaction interval is the initiation interval (II) of the loop times the number of iterations, or tripcount: target_ti = II * loop tripcount. Conversely, target_ti = FreqHz / Operations per second.

For example, assuming an image processing function that processes a single frame per invocation with a throughput goal of 60 fps, then the target throughput for the function is 60 invocations per second. If the clock frequency is 180 MHz, then target_ti is 180M/60, or 3 million clock cycles per function invocation.

Syntax

Place the pragma within the boundary a loop, or the outer loop of a loop nest.

#pragma HLS performance target_ti=<value> target_tl=<value> unit=[sec|cycle]

Where:

target_ti=<value>
Specifies a target transaction interval defined as the number of clock cycles for the function, loop, or region of code to complete an iteration. The <value> can be specified as an integer, floating point, or constant expression that is resolved by the tool as an integer.
Note: A warning will be returned if truncation occurs.
target_tl=<value>

Specifies a target latency defined as the number of clock cycles for the loop to complete all iterations. The transaction latency is defined as the interval between the start of the first iteration of the loop , and the completion of the last iteration of the loop. The <value> can be specified as an integer, floating point, or constant expression that is resolved by the tool as an integer.

unit=[sec | cycle]
Specifies the unit associated with the target_ti or target_tl values. The unit can either be specified as seconds, or clock cycles. When the unit is specified as seconds, a unit can be specified with the value to indicate nanoseconds (ns), picoseconds (ps), microseconds (us).

Example 1

The outer loop is specified to have target transaction interval of 1000 clock cycles.

  for (int i =0; i < 1000; ++i) {
#pragma HLS performance target_ti=1000
    for (int j = 0; j < 8; ++j) {
      int tmp = b_buf[j].read();
      b[i * 8 + j] = tmp + 2;
    }
  }