Vitis HLS supports float
and double
types for synthesis. Both data
types are synthesized with IEEE-754 standard partial compliance (see
Floating-Point Operator LogiCORE IP Product
Guide (PG060)).
- Single-precision 32-bit
- 24-bit fraction
- 8-bit exponent
- Double-precision 64-bit
- 53-bit fraction
- 11-bit exponent
In addition to using floats and doubles for standard arithmetic operations (such as +, -, * ) floats and doubles are commonly used with the math.h (and cmath.h for C++). This section discusses support for standard operators.
The following code example shows the header file used with Standard Types updated to
define the data types to be double
and float
types.
#include <stdio.h>
#include <stdint.h>
#include <math.h>
#define N 9
typedef double din_A;
typedef double din_B;
typedef double din_C;
typedef float din_D;
typedef double dout_1;
typedef double dout_2;
typedef double dout_3;
typedef float dout_4;
void types_float_double(din_A inA,din_B inB,din_C inC,din_D inD,dout_1
*out1,dout_2 *out2,dout_3 *out3,dout_4 *out4);
This updated header file is used with the following code example where a
sqrtf()
function is used.
#include "types_float_double.h"
void types_float_double(
din_A inA,
din_B inB,
din_C inC,
din_D inD,
dout_1 *out1,
dout_2 *out2,
dout_3 *out3,
dout_4 *out4
) {
// Basic arithmetic & math.h sqrtf()
*out1 = inA * inB;
*out2 = inB + inA;
*out3 = inC / inA;
*out4 = sqrtf(inD);
}
When the example above is synthesized, it results in 64-bit double-precision multiplier, adder, and divider operators. These operators are implemented by the appropriate floating-point AMD IP catalog cores.
The square-root function used sqrtf()
is
implemented using a 32-bit single-precision floating-point core.
If the double-precision square-root function sqrt()
was used, it would result in additional logic to cast to and from the
32-bit single-precision float types used for inD and out4:
sqrt()
is a double-precision (double
) function,
while sqrtf()
is a single precision (float
) function.
In C functions, be careful when mixing float and double types as float-to-double and double-to-float conversion units are inferred in the hardware.
float foo_f = 3.1459;
float var_f = sqrt(foo_f);
The above code results in the following hardware:
wire(foo_t)
-> Float-to-Double Converter unit
-> Double-Precision Square Root unit
-> Double-to-Float Converter unit
-> wire (var_f)
Using a sqrtf()
function:
- Removes the need for the type converters in hardware
- Saves area
- Improves timing
When synthesizing float and double types, Vitis HLS maintains the order of operations performed in the C code to ensure that the results are the same as the C simulation. Due to saturation and truncation, the following are not guaranteed to be the same in single and double precision operations:
A=B*C; A=B*F;
D=E*F; D=E*C;
O1=A*D O2=A*D;
With float
and double
types, O1
and O2
are not guaranteed to be the same.
config_compile -unsafe_math_optimizations
. For C++ designs, Vitis HLS provides a bit-approximate implementation of the most commonly used math functions.
Floating-Point Accumulator and MAC
Floating point accumulators (facc
),
multiply and accumulate (fmacc
), and multiply and add
(fmadd
) can be enabled using the
config_op
command shown
below:
config_op <facc|fmacc|fmadd> -impl <none|auto> -precision <low|standard|high>
Vitis HLS supports different levels of precision for these operators that tradeoff between performance, area, and precision on both Versal and non-Versal devices.
- Low-precision accumulation is suitable for high-throughput
low-precision floating point accumulation and multiply-accumulation, this mode is only
available in non-Versal devices.
- It uses an integer accumulator with a pre-scaler and a post-scaler
(to convert input and output to single-precision or double-precision floating point).
- It uses a 60 bit and 100 bit accumulator for single and double precision inputs respectively.
- It can cause cosim mismatches due to insufficient precision with respect to C++ simulation
- It can always be pipelined with an II=1 without source code changes
- It uses approximately 3X the resources of standard-precision floating point accumulation, which achieves an II that is typically between 3 and 5, depending on clock frequency and target device.
Using low-precision, accumulation for floats and doubles is supported with an initiation interval (II) of 1 on all devices. This means that the following code can be pipelined with an II of 1 without any additional coding:float foo(float A[10], float B[10]) { float sum = 0.0; for (int i = 0; i < 10; i++) { sum += A[i] * B[i]; } return sum; }
- It uses an integer accumulator with a pre-scaler and a post-scaler
(to convert input and output to single-precision or double-precision floating point).
- Standard-precision accumulation and multiply-add is suitable for most
uses of floating-point, and is available on Versal
and non-Versal devices.
- It always uses a true floating-point accumulator
- It can be pipelined with an II=1 on Versal devices.
- It can be pipelined with an II that is typically between 3 and 5 (depending on clock frequency and target device) on non-Versal devices. The standard precision mode is more efficient on Versal devices than on non-Versal devices.
- High-precision fused multiply-add is suitable for high-precision
applications and is available on Versal devices.
- It uses one extra bit of precision
- It always uses a single fused multiply-add, with a single rounding at the end, although it uses more resources than the unfused multiply-add
- It can cause cosim mismatches due to the extra precision with respect to C++ simulation