Fast Fourier Transform IP Library - 2025.2 English - UG1399

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2025-11-20
Version
2025.2 English

The AMD FFT LogiCORE IP can be called within a C++ design using the library hls_fft.h. This section explains how the FFT can be configured in your C++ code. An example FFT application can be found in Vitis-HLS-Introductory-Examples on GitHub. Starting with v2025.2, a new class-based parameterization replaces the previous template parameterization method.

The FFT library supports floating-point data types when targeting Versal devices. The hls::fft can be called with different parameter lists through C++ function overloading and template specialization. These overloaded versions of the fft function use different parameter function signatures.

Super Sample Rate (SSR) parameters for both fixed-point and floating-point are supported.

Important: AMD highly recommends that you review the Fast Fourier Transform LogiCORE IP Product Guide (PG109) for information on how to implement and use the features of the IP.

To use the FFT in your C++ code:

  1. Include the hls_fft.h library
  2. Set the default parameters using the library predefined struct hls::ip_fft::params_t, or copy it as your own configuration struct (see the code snippet below).
    Tip: The Vivado LogiCORE IP through its dialog wizard detects and prevents incompatibilities in settings. Check your own configuration struct parameters into these Vivado IP dialogs to avoid unsupported or faulty configurations.
  3. Define the runtime configuration
  4. Call the FFT function
    Tip: Because it uses the DATAFLOW pragma or directive, the pipelined execution of an FFT must be in a dataflow loop and not a pipelined loop.
  5. Optionally, check the runtime status

Each of these steps is discussed in more detail below with code snippets:

First, include the FFT library in the source code. This header file resides in the include directory in the Vitis HLS installation area which is automatically searched when Vitis HLS executes.

#include "hls_fft.h"

Define the static parameters of the FFT such as the input width, number of channels, type of architecture which do not change dynamically. The FFT library includes a parameterization struct hls::ip_fft::params_t to initialize all static parameters with default values.

In the C++ code below, a new user-defined struct config1 copies the library struct params_t to inherit all its default values. Then specific struct parameter values such as output ordering and the widths of the configuration are overridden. The OVERRIDE macro ensures that the name of the parameter is not misspelled, a compile error alerts on the mismatch. For example:

struct config1 : hls::ip_fft::params_t {
    OVERRIDE(config_width) = 24; // works as expected
    OVERRIDE(config_witdh) = 24; // errors out and suggests correct spelling: "no member named 'config_witdh' in 'hls::ip_fft::params_t'; did you mean 'config_width'?"
    config_with = 24;            // errors out with confusing message: "a type specifier is required for all declarations"
    static unsigned config_widt = 24;   // does not error out and simply ignores the mis-spelled parameter. Not recommended!
};

Define types and variables for both the runtime configuration and runtime status. These values can be dynamic, they are defined as variables in the C code, they can change and are accessed through APIs.

typedef hls::ip_fft::config_t<config1> config_t;
typedef hls::ip_fft::status_t<config1> status_t;

Next, set the runtime configuration to the following:

  • The direction of the FFT (not applicable for native floating point; the static configuration parameter systolicfft_inv must be used instead).
  • The scaling schedule (not applicable for native floating point).
  • The cyclic prefix length (required for cyclic_prefix_insertion==1).

Call the FFT function using the hls namespace.

The function parameters are as follows for various streaming configurations, single channel, with or without runtime configurable transform length:

// streaming 1 channel no run_time_configurable_transform_length 
template<typename _CONFIG_T =  hls::ip_fft::params_t,
         typename _DATA_IN_T,
         typename _DATA_OUT_T>
void fft(hls::stream<std::complex<_DATA_IN_T>>& in,
         hls::stream<std::complex<_DATA_OUT_T>>& out,
         bool fwd_inv = false, // direction
         int scale_sch = -1,
         int cp_len = -1,
         bool *ovflo = 0,
         unsigned *blk_exp = 0);
// streaming 1 channel run_time_configurable_transform_length 
template<typename _CONFIG_T =  hls::ip_fft::params_t,
         typename _DATA_IN_T,
         typename _DATA_OUT_T>
void fft(unsigned log2_transform_length,
         hls::stream<std::complex<_DATA_IN_T>>& in,
         hls::stream<std::complex<_DATA_OUT_T>>& out,
         bool fwd_inv = false, // direction
         int scale_sch = -1,
         int cp_len = -1,
         bool *ovflo = 0,
         unsigned *blk_exp = 0);
// streaming SSR>1 1 channel no run_time_configurable_transform_length 
template<typename _CONFIG_T =  hls::ip_fft::params_t,
         typename _DATA_IN_T,
         typename _DATA_OUT_T>
void fft(hls::stream<hls::vector<std::complex<_DATA_IN_T>, _CONFIG_T::super_sample_rate>>&  in,
         hls::stream<hls::vector<std::complex<_DATA_OUT_T>, _CONFIG_T::super_sample_rate>>&  out,
         bool fwd_inv = false, // direction
         int scale_sch = -1,
         int cp_len = -1,
         bool *ovflo = 0,
         unsigned *blk_exp = 0);
// streaming SSR>1 1 channel run_time_configurable_transform_length 
template<typename _CONFIG_T =  hls::ip_fft::params_t,
         typename _DATA_IN_T,
         typename _DATA_OUT_T>
void fft(unsigned log2_transform_length,
         hls::stream<hls::vector<std::complex<_DATA_IN_T>, _CONFIG_T::super_sample_rate>>&  in,
         hls::stream<hls::vector<std::complex<_DATA_OUT_T>, _CONFIG_T::super_sample_rate>>&  out,
         bool fwd_inv = false, // direction
         int scale_sch = -1,
         int cp_len = -1,
         bool *ovflo = 0,
         unsigned *blk_exp = 0);
         

The function parameters are as follows for various array configurations, single or multi-channel, with or without runtime configurable transform length:

// array 1 channel no run_time_configurable_transform_length 
template<typename _CONFIG_T =  hls::ip_fft::params_t,
         typename _DATA_IN_T,
         typename _DATA_OUT_T>
void fft(std::complex<_DATA_IN_T>  in[1 << _CONFIG_T::log2_transform_length],
         std::complex<_DATA_OUT_T>  out[1 << _CONFIG_T::log2_transform_length],
         bool fwd_inv = false, // direction
         int scale_sch = -1,
         int cp_len = -1,
         bool *ovflo = 0,
         unsigned *blk_exp = 0);
// array 1 channel run_time_configurable_transform_length 
template<typename _CONFIG_T =  hls::ip_fft::params_t,
         typename _DATA_IN_T,
         typename _DATA_OUT_T>
void fft(unsigned log2_transform_length,
         std::complex<_DATA_IN_T>  in[1 << _CONFIG_T::log2_transform_length],
         std::complex<_DATA_OUT_T>  out[1 << _CONFIG_T::log2_transform_length],
         bool fwd_inv = false, // direction
         int scale_sch = -1,
         int cp_len = -1,
         bool *ovflo = 0,
         unsigned *blk_exp = 0);
// array N channel no run_time_configurable_transform_length 
template<typename _CONFIG_T =  hls::ip_fft::params_t,
         typename _DATA_IN_T,
         typename _DATA_OUT_T>
void fft(std::complex<_DATA_IN_T>  in[1 << _CONFIG_T::log2_transform_length][_CONFIG_T::channels],
         std::complex<_DATA_OUT_T>  out[1 << _CONFIG_T::log2_transform_length][_CONFIG_T::channels],
         hls::vector<bool, _CONFIG_T::channels> &fwd_inv,
         hls::vector<int, _CONFIG_T::channels> &scale_sch, // used only if enabled in config
         int cp_len = -1,
         hls::vector<bool, _CONFIG_T::channels> *ovflo = 0,
         hls::vector<unsigned, _CONFIG_T::channels> *blk_exp = 0);
// array N channel run_time_configurable_transform_length 
template<typename _CONFIG_T =  hls::ip_fft::params_t,
         typename _DATA_IN_T,
         typename _DATA_OUT_T>
void fft(unsigned log2_transform_length,
         std::complex<_DATA_IN_T>  in[1 << _CONFIG_T::log2_transform_length][_CONFIG_T::channels],
         std::complex<_DATA_OUT_T>  out[1 << _CONFIG_T::log2_transform_length][_CONFIG_T::channels],
         hls::vector<bool, _CONFIG_T::channels> &fwd_inv,
         hls::vector<int, _CONFIG_T::channels> &scale_sch, // used only if enabled in config
         int cp_len = -1,
         hls::vector<bool, _CONFIG_T::channels> *ovflo = 0,
         hls::vector<unsigned, _CONFIG_T::channels> *blk_exp = 0);
Note: In this release, the data mover functions (typically named dummy_fe(), dummy_be()) shown in old FFT IP usage examples (with the API taking as input hls::ip_fft::config_t and producing as output hls::ip_fft::status_t) are no longer required.

For example, this is a call of an FFT with fixed transform length (run_time_configurable_transform_length==0), no cyclic prefix length and no output block exponent.

hls::fft<config1>(xn, xk, dir, scaling, -1, ovflo);

Finally and optionally, check the output statuses, namely:Finally and optionally, check the output statuses, namely the overflow (applicable only for non-native floating point) and the output block exponent (applicable for non-native floating point with block floating point).

Note: The native floating point FFT does not have an output status. If a status argument like the overflow above is passed to the API, its value is always 0.

The parameters of the FFT struct hls::ip_fft::params_t are listed below with their default values.

Table 1. FFT Struct Parameters
C Variable Name C Type Default Value Possible Values
input_width unsigned 16 8-34
output_width unsigned 16 input_width to (input_width + log2_transform_length + 1)
status_width unsigned 8 Depends on FFT configuration
config_width unsigned 16 Depends on FFT configuration
log2_transform_length unsigned 10 3-16
run_time_configurable_transform_length bool false True, False
channels unsigned 1 1-12
implementation_options unsigned pipelined_streaming_io automatically_select pipelined_streaming_io radix_4_burst_io radix_2_burst_io radix_2_lite_burst_io
phase_factor_width unsigned SSR==1 ? 16 : 32 8-34
output_ordering unsigned SSR==1 ? bit_reversed_order : natural_order bit_reversed_order natural_order
ovflo bool SSR==1 ? true : false true false
scaling_options unsigned scaled scaled, unscaled, block_floating_point
rounding_modes unsigned truncation truncation convergent_rounding
memory_options_data unsigned block_ram block_ram distributed_ram
memory_options_phase_factors unsigned block_ram block_ram distributed_ram
memory_options_reorder unsigned block_ram block_ram distributed_ram
number_of_stages_using_block_ram_for_data_and_phase_factors unsigned ((log2_transform_length < 10) ? 1 : (log2_transform_length - 9)) 0-11
memory_options_hybrid bool false true false
complex_mult_type unsigned use_mults_resources use_luts use_mults_resources use_mults_performance
butterfly_type unsigned SSR==1 ? use_luts : use_xtremedsp_slices

use_luts

use_xtremedsp_slices

super_sample_rates unsigned ssr_1 ssr_1, ssr_2, ssr_4, ssr_8, ssr_16, ssr_32, ssr_64
use_native_float bool false true false
Important: When specifying parameter values which are not integer or boolean, the Vitis HLS FFT namespace should be used. For example, the possible values for parameter butterfly_type in the following table are use_luts and use_xtremedsp_slices. The values used in the C program should be butterfly_type = hls::ip_fft::use_luts and butterfly_type = hls::ip_fft::use_xtremedsp_slices.
Tip: The example above shows the use of scalar values and arrays, but the FFT function also supports the use of hls::stream for arguments. Refer to Using FFT Function with Streaming Interface for more information.

Restrictions and Known Issues

The FFTs parametrized with super sample rate larger than one (SSR>1) for both fixed and native floating-point share the same configuration restrictions as for the underlying LogiCORE IP, namely:

  • The numerical precision of the C model of FFT IP core is affected by the C model of HLS math on Linux. To preserve the accuracy of the C model of FFT, add the -stdmath option for both csim_design and cosim_design to use std::math for the C model of HLS math library.
    • Using this option you get C simulation of floating-point math library functions (for example, exp, pow, and log) called outside the FFT model which is:
    • Accurate with respect to the std::math function definitions.
    • The software model provides a functional representation of the hardware, but not bit accurate with respect to RTL simulation, meaning that while it captures the intended behavior, it can not precisely match the exact bit-level operations and timing of the RTL design.
    • HLS by default performs C simulation of these standard math functions with a C model that is bit-accurate with respect to the RTL that it generates.
    • The FFT C simulation model delivered as part of the LogiCORE IP expects to use the standard math functions, not the HLS C math simulation model.
  • No support for dynamic configuration (for example, direction, transform length, and scaling schedule). For compatibility the API provide a dynamic config input but the values are ignored, and the API checks that the values, if specified, are the same as the underlying IP defaults.
  • There is no dynamic configuration support, the direction of the transform (specified using dynamic config fwd_inv) is specified using a new static config parameter called systolicfft_inv.
  • No output status (for example, no overflow output). For compatibility the APIs provide a status output argument, which is always set to 0.
  • Phase_factor_width is fixed at 32 bits for floating point and 19 bits for fixed point.
  • XK_INDEX is not supported in the current version of the software.
  • Output_ordering is fixed at natural order, meaning that the results can be presented in the sequence they naturally occur, without any additional sorting or rearrangement applied.
  • Cyclic_prefix_insertion is not supported in the current version of the software.
  • Complex_mult_type is fixed at use_mults_performance.
  • Butterfly_type is fixed at use_extremedsp_slice.
  • The SSR FFTs have a 6-cycle gap in the input TREADY and output TVALID whenever the FFT starts a new data frame.