Vitis HLS supports float
and double
types for synthesis. Both data
types are synthesized with IEEE-754 standard partial compliance (see
Floating-Point Operator LogiCORE IP Product
Guide (PG060)).
- Single-precision 32-bit
- 24-bit fraction
- 8-bit exponent
- Double-precision 64-bit
- 53-bit fraction
- 11-bit exponent
In addition to using floats and doubles for standard arithmetic operations (such as +, -, * ) floats and doubles are commonly used with the math.h (and cmath.h for C++). This section discusses support for standard operators.
The following code example shows the header file used with Standard Types updated to
define the data types to be double
and float
types.
#include <stdio.h>
#include <stdint.h>
#include <math.h>
#define N 9
typedef double din_A;
typedef double din_B;
typedef double din_C;
typedef float din_D;
typedef double dout_1;
typedef double dout_2;
typedef double dout_3;
typedef float dout_4;
void types_float_double(din_A inA,din_B inB,din_C inC,din_D inD,dout_1
*out1,dout_2 *out2,dout_3 *out3,dout_4 *out4);
This updated header file is used with the following code example where a
sqrtf()
function is used.
#include "types_float_double.h"
void types_float_double(
din_A inA,
din_B inB,
din_C inC,
din_D inD,
dout_1 *out1,
dout_2 *out2,
dout_3 *out3,
dout_4 *out4
) {
// Basic arithmetic & math.h sqrtf()
*out1 = inA * inB;
*out2 = inB + inA;
*out3 = inC / inA;
*out4 = sqrtf(inD);
}
When the example above is synthesized, it results in 64-bit double-precision multiplier, adder, and divider operators. These operators are implemented by the appropriate floating-point Xilinx IP catalog cores.
The square-root function used sqrtf()
is
implemented using a 32-bit single-precision floating-point core.
If the double-precision square-root function sqrt()
was used, it would result in additional logic to cast to and from the
32-bit single-precision float types used for inD and out4:
sqrt()
is a double-precision (double
) function,
while sqrtf()
is a single precision (float
) function.
In C functions, be careful when mixing float and double types as float-to-double and double-to-float conversion units are inferred in the hardware.
float foo_f = 3.1459;
float var_f = sqrt(foo_f);
The above code results in the following hardware:
wire(foo_t)
-> Float-to-Double Converter unit
-> Double-Precision Square Root unit
-> Double-to-Float Converter unit
-> wire (var_f)
Using a sqrtf()
function:
- Removes the need for the type converters in hardware
- Saves area
- Improves timing
When synthesizing float and double types, Vitis HLS maintains the order of operations performed in the C code to ensure that the results are the same as the C simulation. Due to saturation and truncation, the following are not guaranteed to be the same in single and double precision operations:
A=B*C; A=B*F;
D=E*F; D=E*C;
O1=A*D O2=A*D;
With float
and double
types, O1
and O2
are not guaranteed to be the same.
config_compile -unsafe_math_optimizations
. For C++ designs, Vitis HLS provides a bit-approximate implementation of the most commonly used math functions.
Floating-Point Accumulator and MAC
float foo(float A[10], float B[10]) {
float sum = 0.0;
for (int i = 0; i < 10; i++) {
sum += A[i] * B[i];
}
return sum;
}
# Enable or disable double precision accumulation (true by default)
::common::set_param hls.enable_float_acc_inference false
# Enable or disable double precision MAC on Versal devices (true by default)
::common::set_param hls.enable_float_mul_acc_inference false