Description
The PERFORMANCE pragma lets you specify a high-level constraint (target_ti
) defining the number of clock cycles between
successive starts of a loop, and lets the tool infer lower-level UNROLL, PIPELINE,
ARRAY_PARTITION, and INLINE pragmas needed to achieve the desired result. The PERFORMANCE
pragma does not guarantee the specified value will
be achieved, and so it is only a target.
#pragma HLS INLINE OFF
The target transaction interval (target_ti
) specifies a performance target for loops, where a transaction is a
complete set of loop iterations (tripcount) and the interval is the time between when the
first transaction starts and the second transaction starts.
- Target Transaction Interval (
target_ti
) - Specifies the number of clock cycles between successive starts of the loop. In other words, the clock cycles from the start of the first transaction of a loop, or nested loop, and the start of the next transaction of the loop.
The transaction interval is the initiation interval (II) of the loop times
the number of iterations, or tripcount: target_ti = II *
loop tripcount. Conversely, target_ti
= FreqHz / Operations per second.
For example, assuming an image processing function that processes a single
frame per invocation with a throughput goal of 60 fps, then the target throughput for the
function is 60 invocations per second. If the clock frequency is 180 MHz, then target_ti
is 180M/60, or 3 million clock cycles per function
invocation.
Syntax
Place the pragma within the boundary a loop, or the outer loop of a loop nest.
#pragma HLS performance target_ti=<value>
Where:
-
target_ti=<value>
- Specifies a target transaction interval defined as the number of clock
cycles for the function, loop, or region of code to complete an iteration. The
<value> can be specified as an integer, floating point, or constant expression
that is resolved by the tool as an integer. Note: A warning will be returned if truncation occurs.
Example 1
The outer loop is specified to have target transaction interval of 1000 clock cycles.
for (int i =0; i < 1000; ++i) {
#pragma HLS performance target_ti=1000
for (int j = 0; j < 8; ++j) {
int tmp = b_buf[j].read();
b[i * 8 + j] = tmp + 2;
}
}