Overview of Arbitrary Precision Fixed-Point Data Types - 2023.2 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
Release Date
2023.2 English

Fixed-point data types model the data as an integer and fraction bits in two's complement with the format ap_fixed<W,I,[Q,O,N]> as explained in the following table. In the following example, the Vitis HLS ap_fixed type is used to define an 18-bit variable with 6 bits (including the sign bit) specified as representing the numbers above the binary point, and 12 bits implied to represent the fractional value after the decimal point. The variable is specified as signed and the quantization mode is set to round to plus infinity. Because the overflow mode is not specified, the default wrap-around mode is used for overflow.

#include <ap_fixed.h>
ap_fixed<18,6,AP_RND> t1 = 1.5; // internally represented as 0b00'0001.1000'0000'0000 (0x01800)
ap_fixed<18.6,AP_RND> t2 = -1.5; // 0b11'1110.1000'0000'0000 (0x3e800)
Tip: The Integer value (I) of ap_fixed specifies the number of integer bits to the left of the binary point, including the sign bit.

When performing calculations where the variables have different number of bits or different precision, the binary point is automatically aligned. For example, when performing division with fixed-point type variables of different sizes, the fraction of the quotient is no greater than that of the dividend. To preserve the fractional part of the quotient you can cast the result to the new variable width before assignment.

The behavior of the C++ simulations performed using fixed-point matches the resulting hardware. This allows you to analyze the bit-accurate, quantization, and overflow behaviors using fast C-level simulation.

Fixed-point types are a useful replacement for floating point types which require many clock cycle to complete. Unless the entire range of the floating-point type is required, the same accuracy can often be implemented with a fixed-point type resulting in the same accuracy with smaller and faster hardware.

A summary of the ap_fixed type identifiers is provided in the following table.

Table 1. Fixed-Point Identifier Summary
Identifier Description
W Word length in bits

The number of bits used to represent the integer value, that is, the number of integer bits to the left of the binary point, including the sign bit.

When I is negative, as shown in the example below, it represents the number of implicit sign bits (for signed representation), or the number of implicit zero bits (for unsigned representation) to the right of the binary point. For example,

ap_fixed<2, 0> a = -0.5;    // a can be -0.5,

ap_ufixed<1, 0> x = 0.5;    // 1-bit representation. x can be 0 or 0.5
ap_ufixed<1, -1> y = 0.25;  // 1-bit representation. y can be 0 or 0.25
const ap_fixed<1, -7> z = 1.0/256;  // 1-bit representation for z = 2^-8
Q Quantization mode: This dictates the behavior when greater precision is generated than can be defined by smallest fractional bit in the variable used to store the result.
ap_fixed Types Description
AP_RND Round to plus infinity
AP_RND_ZERO Round to zero
AP_RND_MIN_INF Round to minus infinity
AP_RND_INF Round to infinity
AP_RND_CONV Convergent rounding
AP_TRN Truncation to minus infinity (default)
AP_TRN_ZERO Truncation to zero

Overflow mode: This dictates the behavior when the result of an operation exceeds the maximum (or minimum in the case of negative numbers) possible value that can be stored in the variable used to store the result.

ap_fixed Types Description
AP_SAT 1 Saturation
AP_SAT_ZERO 1 Saturation to zero
AP_SAT_SYM 1 Symmetrical saturation
AP_WRAP Wrap around (default)
AP_WRAP_SM Sign magnitude wrap around
N This defines the number of saturation bits in overflow wrap modes.
  1. Using the AP_SAT* modes can result in higher resource usage as extra logic will be needed to perform saturation and this extra cost can be as high as 20% additional LUT usage.
  2. Fixed-point math functions from the hls_math library do not support the ap_[u]fixed template parameters Q,O, and N, for quantization mode, overflow mode, and the number of saturation bits, respectively. The quantization and overflow modes are only effective when an ap_[u]fixed variable is on the left hand of assignment or being initialized, but not during the calculation.

The default maximum width allowed for ap_[u]fixed data types is 1024 bits. This default may be overridden by defining the macro AP_INT_MAX_W with a positive integer value less than or equal to 4096 before inclusion of the ap_int.h header file.

Important: ROM Synthesis can take a long time when using ap_[u]fixed. Changing it to int results in a quicker synthesis. For example:
static ap_fixed<32,0> a[32][depth] = 

Can be changed to:

static int a[32][depth] =