The AMD FFT LogiCORE IP can be called within a C++ design using
the library hls_fft.h. This section explains how the
FFT can be configured in your C++ code. An example FFT application can be found in
Vitis-HLS-Introductory-Examples on
GitHub. Starting with v2025.2, a new class-based parameterization replaces the
previous template parameterization method.
The FFT library supports floating-point data types when targeting
Versal devices. The
hls::fft can be called with different parameter lists through
C++ function overloading and template specialization. These overloaded versions of
the fft function use different parameter function
signatures.
Super Sample Rate (SSR) parameters for both fixed-point and floating-point are supported.
To use the FFT in your C++ code:
- Include the hls_fft.h library
- Set the default parameters using the library predefined struct
hls::ip_fft::params_t, or copy it as your own configuration struct (see the code snippet below).Tip: The Vivado LogiCORE IP through its dialog wizard detects and prevents incompatibilities in settings. Check your own configuration struct parameters into these Vivado IP dialogs to avoid unsupported or faulty configurations. - Define the runtime configuration
- Call the FFT functionTip: Because it uses the DATAFLOW pragma or directive, the pipelined execution of an FFT must be in a dataflow loop and not a pipelined loop.
- Optionally, check the runtime status
Each of these steps is discussed in more detail below with code snippets:
First, include the FFT library in the source code. This header file resides in the include directory in the Vitis HLS installation area which is automatically searched when Vitis HLS executes.
#include "hls_fft.h"
Define the static parameters of the FFT such as the input width,
number of channels, type of architecture which do not change dynamically. The FFT
library includes a parameterization struct hls::ip_fft::params_t to initialize all static parameters with default
values.
In the C++ code below, a new user-defined struct
config1 copies the library struct
params_t to inherit all its default
values. Then specific struct parameter values such as output ordering and the widths
of the configuration are overridden. The OVERRIDE macro ensures that the name of the
parameter is not misspelled, a compile error alerts on the mismatch. For
example:
struct config1 : hls::ip_fft::params_t {
OVERRIDE(config_width) = 24; // works as expected
OVERRIDE(config_witdh) = 24; // errors out and suggests correct spelling: "no member named 'config_witdh' in 'hls::ip_fft::params_t'; did you mean 'config_width'?"
config_with = 24; // errors out with confusing message: "a type specifier is required for all declarations"
static unsigned config_widt = 24; // does not error out and simply ignores the mis-spelled parameter. Not recommended!
};
Define types and variables for both the runtime configuration and runtime status. These values can be dynamic, they are defined as variables in the C code, they can change and are accessed through APIs.
typedef hls::ip_fft::config_t<config1> config_t;
typedef hls::ip_fft::status_t<config1> status_t;
Next, set the runtime configuration to the following:
- The direction of the FFT (not applicable for native floating
point; the static configuration parameter
systolicfft_invmust be used instead). - The scaling schedule (not applicable for native floating point).
- The cyclic prefix length (required for cyclic_prefix_insertion==1).
Call the FFT function using the hls namespace.
The function parameters are as follows for various streaming configurations, single channel, with or without runtime configurable transform length:
// streaming 1 channel no run_time_configurable_transform_length
template<typename _CONFIG_T = hls::ip_fft::params_t,
typename _DATA_IN_T,
typename _DATA_OUT_T>
void fft(hls::stream<std::complex<_DATA_IN_T>>& in,
hls::stream<std::complex<_DATA_OUT_T>>& out,
bool fwd_inv = false, // direction
int scale_sch = -1,
int cp_len = -1,
bool *ovflo = 0,
unsigned *blk_exp = 0);
// streaming 1 channel run_time_configurable_transform_length
template<typename _CONFIG_T = hls::ip_fft::params_t,
typename _DATA_IN_T,
typename _DATA_OUT_T>
void fft(unsigned log2_transform_length,
hls::stream<std::complex<_DATA_IN_T>>& in,
hls::stream<std::complex<_DATA_OUT_T>>& out,
bool fwd_inv = false, // direction
int scale_sch = -1,
int cp_len = -1,
bool *ovflo = 0,
unsigned *blk_exp = 0);
// streaming SSR>1 1 channel no run_time_configurable_transform_length
template<typename _CONFIG_T = hls::ip_fft::params_t,
typename _DATA_IN_T,
typename _DATA_OUT_T>
void fft(hls::stream<hls::vector<std::complex<_DATA_IN_T>, _CONFIG_T::super_sample_rate>>& in,
hls::stream<hls::vector<std::complex<_DATA_OUT_T>, _CONFIG_T::super_sample_rate>>& out,
bool fwd_inv = false, // direction
int scale_sch = -1,
int cp_len = -1,
bool *ovflo = 0,
unsigned *blk_exp = 0);
// streaming SSR>1 1 channel run_time_configurable_transform_length
template<typename _CONFIG_T = hls::ip_fft::params_t,
typename _DATA_IN_T,
typename _DATA_OUT_T>
void fft(unsigned log2_transform_length,
hls::stream<hls::vector<std::complex<_DATA_IN_T>, _CONFIG_T::super_sample_rate>>& in,
hls::stream<hls::vector<std::complex<_DATA_OUT_T>, _CONFIG_T::super_sample_rate>>& out,
bool fwd_inv = false, // direction
int scale_sch = -1,
int cp_len = -1,
bool *ovflo = 0,
unsigned *blk_exp = 0);
The function parameters are as follows for various array configurations, single or multi-channel, with or without runtime configurable transform length:
// array 1 channel no run_time_configurable_transform_length
template<typename _CONFIG_T = hls::ip_fft::params_t,
typename _DATA_IN_T,
typename _DATA_OUT_T>
void fft(std::complex<_DATA_IN_T> in[1 << _CONFIG_T::log2_transform_length],
std::complex<_DATA_OUT_T> out[1 << _CONFIG_T::log2_transform_length],
bool fwd_inv = false, // direction
int scale_sch = -1,
int cp_len = -1,
bool *ovflo = 0,
unsigned *blk_exp = 0);
// array 1 channel run_time_configurable_transform_length
template<typename _CONFIG_T = hls::ip_fft::params_t,
typename _DATA_IN_T,
typename _DATA_OUT_T>
void fft(unsigned log2_transform_length,
std::complex<_DATA_IN_T> in[1 << _CONFIG_T::log2_transform_length],
std::complex<_DATA_OUT_T> out[1 << _CONFIG_T::log2_transform_length],
bool fwd_inv = false, // direction
int scale_sch = -1,
int cp_len = -1,
bool *ovflo = 0,
unsigned *blk_exp = 0);
// array N channel no run_time_configurable_transform_length
template<typename _CONFIG_T = hls::ip_fft::params_t,
typename _DATA_IN_T,
typename _DATA_OUT_T>
void fft(std::complex<_DATA_IN_T> in[1 << _CONFIG_T::log2_transform_length][_CONFIG_T::channels],
std::complex<_DATA_OUT_T> out[1 << _CONFIG_T::log2_transform_length][_CONFIG_T::channels],
hls::vector<bool, _CONFIG_T::channels> &fwd_inv,
hls::vector<int, _CONFIG_T::channels> &scale_sch, // used only if enabled in config
int cp_len = -1,
hls::vector<bool, _CONFIG_T::channels> *ovflo = 0,
hls::vector<unsigned, _CONFIG_T::channels> *blk_exp = 0);
// array N channel run_time_configurable_transform_length
template<typename _CONFIG_T = hls::ip_fft::params_t,
typename _DATA_IN_T,
typename _DATA_OUT_T>
void fft(unsigned log2_transform_length,
std::complex<_DATA_IN_T> in[1 << _CONFIG_T::log2_transform_length][_CONFIG_T::channels],
std::complex<_DATA_OUT_T> out[1 << _CONFIG_T::log2_transform_length][_CONFIG_T::channels],
hls::vector<bool, _CONFIG_T::channels> &fwd_inv,
hls::vector<int, _CONFIG_T::channels> &scale_sch, // used only if enabled in config
int cp_len = -1,
hls::vector<bool, _CONFIG_T::channels> *ovflo = 0,
hls::vector<unsigned, _CONFIG_T::channels> *blk_exp = 0);
hls::ip_fft::config_t and producing
as output
hls::ip_fft::status_t) are no longer
required.For example, this is a call of an FFT with fixed transform length
(run_time_configurable_transform_length==0),
no cyclic prefix length and no output block exponent.
hls::fft<config1>(xn, xk, dir, scaling, -1, ovflo);
Finally and optionally, check the output statuses, namely:Finally and optionally, check the output statuses, namely the overflow (applicable only for non-native floating point) and the output block exponent (applicable for non-native floating point with block floating point).
The parameters of the FFT struct hls::ip_fft::params_t are listed below with their default values.
| C Variable Name | C Type | Default Value | Possible Values |
|---|---|---|---|
| input_width | unsigned | 16 | 8-34 |
| output_width | unsigned | 16 | input_width to (input_width + log2_transform_length + 1) |
| status_width | unsigned | 8 | Depends on FFT configuration |
| config_width | unsigned | 16 | Depends on FFT configuration |
| log2_transform_length | unsigned | 10 | 3-16 |
| run_time_configurable_transform_length | bool | false | True, False |
| channels | unsigned | 1 | 1-12 |
| implementation_options | unsigned | pipelined_streaming_io | automatically_select pipelined_streaming_io radix_4_burst_io radix_2_burst_io radix_2_lite_burst_io |
| phase_factor_width | unsigned | SSR==1 ? 16 : 32 | 8-34 |
| output_ordering | unsigned | SSR==1 ? bit_reversed_order : natural_order | bit_reversed_order natural_order |
| ovflo | bool | SSR==1 ? true : false | true false |
| scaling_options | unsigned | scaled | scaled, unscaled, block_floating_point |
| rounding_modes | unsigned | truncation | truncation convergent_rounding |
| memory_options_data | unsigned | block_ram | block_ram distributed_ram |
| memory_options_phase_factors | unsigned | block_ram | block_ram distributed_ram |
| memory_options_reorder | unsigned | block_ram | block_ram distributed_ram |
| number_of_stages_using_block_ram_for_data_and_phase_factors | unsigned | ((log2_transform_length < 10) ? 1 : (log2_transform_length - 9)) | 0-11 |
| memory_options_hybrid | bool | false | true false |
| complex_mult_type | unsigned | use_mults_resources | use_luts use_mults_resources use_mults_performance |
| butterfly_type | unsigned | SSR==1 ? use_luts : use_xtremedsp_slices |
use_luts use_xtremedsp_slices |
| super_sample_rates | unsigned | ssr_1 | ssr_1, ssr_2, ssr_4, ssr_8, ssr_16, ssr_32, ssr_64 |
| use_native_float | bool | false | true false |
hls::stream for arguments. Refer to Using FFT Function with Streaming Interface for more information. Restrictions and Known Issues
The FFTs parametrized with super sample rate larger than one (SSR>1) for both fixed and native floating-point share the same configuration restrictions as for the underlying LogiCORE IP, namely:
- The numerical precision of the C model of FFT IP core is
affected by the C model of HLS math on Linux. To preserve the accuracy of the C
model of FFT, add the
-stdmathoption for bothcsim_designandcosim_designto usestd::mathfor the C model of HLS math library.- Using this option you get C simulation of floating-point math library functions (for example, exp, pow, and log) called outside the FFT model which is:
- Accurate with respect to the std::math function definitions.
- The software model provides a functional representation of the hardware, but not bit accurate with respect to RTL simulation, meaning that while it captures the intended behavior, it can not precisely match the exact bit-level operations and timing of the RTL design.
- HLS by default performs C simulation of these standard math functions with a C model that is bit-accurate with respect to the RTL that it generates.
- The FFT C simulation model delivered as part of the LogiCORE IP expects to use the standard math functions, not the HLS C math simulation model.
- No support for dynamic configuration (for example, direction, transform length, and scaling schedule). For compatibility the API provide a dynamic config input but the values are ignored, and the API checks that the values, if specified, are the same as the underlying IP defaults.
- There is no dynamic configuration support, the direction of the transform
(specified using dynamic
config fwd_inv) is specified using a new staticconfigparameter calledsystolicfft_inv. - No output status (for example, no overflow output). For compatibility the APIs provide a status output argument, which is always set to 0.
- Phase_factor_width is fixed at 32 bits for floating point and 19 bits for fixed point.
- XK_INDEX is not supported in the current version of the software.
- Output_ordering is fixed at natural order, meaning that the results can be presented in the sequence they naturally occur, without any additional sorting or rearrangement applied.
- Cyclic_prefix_insertion is not supported in the current version of the software.
- Complex_mult_type is fixed at use_mults_performance.
- Butterfly_type is fixed at use_extremedsp_slice.
- The SSR FFTs have a 6-cycle gap in the input TREADY and output TVALID whenever the FFT starts a new data frame.