1-Dimensional(Line) SSR FFT L1 FPGA Module - 1-Dimensional(Line) SSR FFT L1 FPGA Module - 2025.2 English

Vitis Libraries

Release Date
2026-02-09
Version
2025.2 English

Overview

The AMD Vitis™ DSP Library offers a fully synthesizable Super Sample Rate (SSR) FFT with a systolic architecture to process multiple input samples every clock cycle. The number of samples processed in parallel per cycle is denoted by the SSR factor. The SSR FFT is implemented as a C++ template function that synthesizes into a streaming architecture. The SSR FFT architecture used for implementation can be parameterized through template parameters, which are grouped in a C++ struct of type ssr_fft_default_params. A new structure can be defined by extending the default structure and redefining required member constants as follows:

struct ssr_fft_fix_params : ssr_fft_default_params
{
        static const int N = 1024;
        static const int R = 4;
        static const scaling_mode_enum scaling_mode = SSR_FFT_NO_SCALING;
        static const fft_output_order_enum output_data_order = SSR_FFT_NATURAL;
        static const int twiddle_table_word_length = 18;
        static const int twiddle_table_integer_part_length = 2;
        static const transform_direction_enum transform_direction = FORWARD_TRANSFORM;
        static const butterfly_rnd_mode_enum butterfly_rnd_mode = TRN;
};

The structure above defines:

  • N: Size or length of transform.
  • R: The number of samples to be processed in parallel SSR Factor and radix of SSR FFT algorithm used.
  • scaling_mode: The scaling mode as enumeration type (SSR FFT has three different scaling modes).
  • output_data_order: Decides if data will be in the natural order or digit reversed transposed order.
  • twiddle_table_word_length: Defines the total number of bits to be used for storing twiddle table factors.
  • twiddle_table_integer_part_length: The number of integer bits used for storing integer part of twiddles.
  • transform_direction: Defines the direction of transform, inverse transform (SSR IFFT) or forward transform (SSR FFT).
  • butterfly_rnd_mode: Defines the rounding mode used by butterflies in SSR FFT stages.

Multi-Instance Support

Single-instance example:

xf::dsp::fft::fft<ssr_fft_fix_params>(...);

The current release of Vitis SSR FFT supports the use of multiple instances of 1-D SSR FFT in a single design. To enable the use of multiple instances, the fft function takes as an input a new template parameter besides the parameter structure. This parameter gets a default value if no value is provided for it. But if multiple instance support is required, all the instances used should be provided with the unique integer template parameter, like:

xf::dsp::fft::fft<ssr_fft_fix_params, 1>(...);
xf::dsp::fft::fft<ssr_fft_fix_params, 2>(...);

Data Type Support for Synthesis

Currently, 1-D SSR FFT supports fixed-point and floating-point complex inputs for synthesis.

Fixed Point

The fixed-point SSR FFT implementation is based on fixed-point data types std::complex<ap_fixed<>> which are used for synthesis and implementation. It is possible to use floating-point types std::complex<float> and std::complex<double> for simulation, but these floating-point complex models will consume massive resources if synthesized to hardware. For the best results with the fixed-point type, limit the data bit width to 27 bits (integer + fraction) as it helps to map multiplication and addition within SSR FFT butterflies directly onto a single DSP block. Larger inputs can be used but can lead to slower Fmax and more resource utilization. Finally, the complex exponential/twiddle factor storage is 18 bits (16F+2I bits). The selection of 18-bit is made keeping in view the 18x27 multipliers available within DSP blocks on FPGAs.

Floating Point

1-D SSR FFT also supports synthesis for single- or double-precision floating-point types. For synthesizing a complex floating-point type, it is required that the std::complex type is not used as a complex wrapper because this wrapper has some issues; instead, use the AMD Vitis™ DSP library wrapper complex_wrapper<…> for wrapping complex floating-point numbers. Also, while synthesizing the floating-point 1-D SSR FFT, the parameters in the structure which carry information such as scaling mode, twiddle factor storage bits, butterfly rounding mode, etc., which are only related to the fixed-point data path, carry no meaning. Instead, the SSR FFT parameter structure can simply define relevant parameters as follows:

struct ssr_fft_fix_params:ssr_fft_default_params
{
        static const int N = 1024;
        static const int R = 4;
        static const fft_output_order_enum output_data_order = SSR_FFT_NATURAL;
        static const transform_direction_enum transform_direction = FORWARD_TRANSFORM;
};

Managing Bit Growth in SSR FFT Stages

The bit growth management is required for the fixed point implementation only. The SSR FFT supports three different modes to manage bit growth between SSR FFT stages. These three modes can be used to allow bit growth in every stage, or use scaling in every stage without any bit growth or allow bit growth until 27 bits and then start using scaling. The detailed description for the different modes are as follows:

SSR_FFT_GROW_TO_MAX_WIDTH

When the scaling_mode constant in the parameter structure is set to SSR_FFT_GROW_TO_MAX_WIDTH, it specifies growth from stage to stage, starting from the first stage to a specified max bit width. The output bit width grows until 27 bits and then saturates. The output bit width grows by log2(R) bits in every stage, and then maxes outs at 27 bits to keep the butterfly operation mapping to DSPs. This option is useful when the initial input bit width is less than 27 bits.

SSR_FFT_SCALE

When the scaling_mode constant in the parameter structure is set to SSR_FFT_SCALE, it enables scaling of outputs in every stage. The output is scaled in every stage and loses precision. An FFT with size L and Radix=SSR=R has logR(L) stages. This option is useful when the input bit width is already close to 27 bits and it is required that the output does not grow beyond 27 bits so that multiplication can be mapped to DSPs.

SSR_FFT_NO_SCALE

When the scaling_mode constant in the parameter structure is set to SSR_FFT_NO_SCALE, the bit growth is allowed in every stage and the output grows unbounded by log2(R) in every stage. This setting can be useful when high precision is required. However, if the output bit width grows beyond 27 bits, the multiplication might not map to only DSPs but also start using FPGA fabric logic in combination. This can reduce the clock speed and increase the resource utilization.

1-D SSR FFT supports multiple scaling modes and provides options to define input bit-widths and bit-width required to store exponential values (sin/cos in look-up tables). The signal to noise ratio that defines the quality of the output signal depends on the choice of these different parameters and also on the quantification scheme used for converting real valued continuous signals or a flloating point signal to a fixed point. The range and the resolution of the signal, essentially the integer bits and the fraction bits, should be selected carefully to have good signal-to-noise ratio (SNR) at the output of the SSR FFT. The following is the recommended fllow for working with 1-D SSR FFT HLS IP for a fixed point implementation.

Start With Floating Point Model

Currently, 1-D SSR FFT can be used with ap_fixed<>, float, and double types. The following table lists the support for synthesis and simulation.

Type Synthesis Simulation
std::complex <ap_fixed <>> YES YES
std::complex<float> NO YES
std::complex<double> NO YES
complex_wrapper<double> YES YES
complex_wrapper<float> YES YES

The recommended starting point is to start with flloat/double inner type in std::complex<>, and verify the SNR against a reference model, such as the Matlab/Python/Octave/Simulink whichever modeling language or tool is used by generating the golden test vectors. The synthesizable version of the 1-D SSR FFT currently supports ap_fixed<> and float as the inner type, so the next step in case of a fixed point implementation is to start experimenting with a fixed point model.

Fixed Point Modeling and Implementation

Fixed Point Model

When working with the fixed point model, the recommended scaling mode to start with is SSR_FFT_NO_SCALING. The input bit-widths should be selected as follows. Create an initial fixed point model with type ap_fixed<WL, IL>. The overall input type is std::complex <ap_fixed<WL, IL>, essentially storing the real and imaginary parts of the input. These parts are:

  • IL: Integer bits, selected based on the input range
  • WL: Word Length = IL + FL, where FL is the Fraction Bit Width, selected based on the input resolution

In this case, 1-D SSR FFT internally does not use any scaling because of the scaling mode selection. Therefore, no potential scaling errors will be seen at the output. With the scaling mode set to no scaling, you can experiment with other fixed point parameters, such as integer bits and fraction bits, used to represent the input samples. The simplistic approach would be to select bits required to represent the input based on the input range and resolution, but depending on the other input characteristic, you can optimize these bit widths.

Selecting Bit Widths for Inputs

The selection of the input bit width depends on the input data characteristics and the required resolution and is a data-dependent choice, essentially depending on the range and resolution of the test data. For simulation purposes, you can select an arbitrarily large number of bits for representing integer and fraction bits. For implementation, you must make an optimal choice keeping in mind the required SNR. The recommended strategy is to do the following:

  • Keep the scaling mode fixed to SSR_FFT_NO_SCALING.
  • Change the input bits for the integer and fraction representation by observing the signal to noise ratio at the output of 1-D SSR FFT.
  • Reduce the bit widths such that the output SNR requirement is met by the minimum required bits.

Once the SNR requirements are met, you can proceed to other fixed point optimizations, such as bits required to store complex exponential tables and 1-D SSR FFT output scaling options.

Twiddle Factor or Sine/Cosine Lookup Table Quantization

You can change the number of bits used to quantize the sin/cos table (twiddle factors/complex exponential). The recommended setting is total 18 bits and 2 bits for the fraction. This setting ensures that during multiplication, the twiddle/sin/cos input can map to the 18-bit input of the DSP block in AMD Adaptive Engineering FPGAs. The model can synthesize and work for other large bit widths, but performance might be worse due to multiplication operations not mapping to a single DSP block and being implemented using multiple DSP blocks and/or FPGA fabric. The twiddle factor width reduction can be useful when the initial setting for the twiddle factor storage is larger than 18 bits. By default, it is set to use 18 bits with 2 bits reserved for the signed integer part. The 2 bits are essentially needed to accurately represent a range of numbers from +1 to -1 (for sin/cos) in the table.

Choosing the Best Scaling Mode

After the choice for input bit width and twiddle factors is made with no scaling, which gives an acceptable SNR or root mean square (RMS) error at the output of fixed point 1-D SSR FFT, you can start to experiment with the choice of scaling modes. Three different scaling modes are available with 1-D SSR FFT. The recommended strategy is to start with SSR_FFT_NO_SCALING. If there is an acceptable SNR/RMS error at the output, switch to SSR_FFT_GROW_TO_MAX_WIDTH. If there is still an acceptable SNR/RMS error, switch to SSR_FFT_SCALE, and observe the SNR/RMS again if it is acceptable, keep using SSR_FFT_SCALE, otherwise revert back to another mode which gives an acceptable SNR/RMS error at the output.

SSR_FFT_NO_SCALING

This is the recommended mode to start with. It performs no scaling, but the output bit width grows in every stage by log2(R=SSR). For example, if the size of FFT is N=64 and SSR=R=4 is selected, then 1-D SSR FFT has log4 (64) = 3 stages. If the input bit width is W, the output bit width is W+3*2=W+6. Therefore, the output would have grown by logR(N)*log2(R) bits. SSR_FFT_NO_SCALING preserves the accuracy of the computation but at maximum hardware cost. The 1-D SSR FFT computation is done in stages with one stage feeding the next stage, so essentially it is a pipeline of stages. One of the downfalls of uncontrolled bit growth is that at some point, in a certain stage when output widths of one stage increase beyond a limit where multiplication operation cannot map to DSP blocks on the FPGA, the design performance in terms of speed might fall considerably. For example, for a given design with logR(N) * log2(R) + Input Bit Width(IL+FL) > max(DSP Block Multiplier Inputs), you might consider using one of the other two available scaling schemes. For DSP48 blocks with 18x27 multipliers, the condition will be logR(N) * log2(R) + Input Bit Width > 27.

SSR_FFT_GROW_TO_MAX_WIDTH

In this mode, a hybrid approach is used. Initially, the bit growth is allowed if there is any room for growth. If in the starting FFT stages, the output bit-widths are smaller than what can be mapped to the DSP blocks, it allows the bit growth. When the bit width grows beyond what can be mapped to the DSP blocks, it will start scaling the output.

SSR_FFT_SCALE

When you know that for a given 1-D FFT size N and SSR factor, the output will grow beyond a limit which the DSP multiplier blocks cannot handle on a given FPGA device, you have the option to set the scaling on for every stage by selecting the SSR_FFT_SCALE option. This option scales the output in every stage by right shifting the output by log2 (SSR=R) in every stage. The recommended flow only provides a guideline for creating a fixed point model and discusses options available for it in 1-D SSR FFT. Depending on the design SNR/RMS requirements, you are required to carefully select all these parameters keeping in view different performance and SNR/RMS requirements for the given application.

1-D SSR FFT Library Usage

The following sections describe how to use SSR FFT from the Vitis DSP Library.

Fixed Point 1-D SSR FFT Usage

The Vitis 1-D SSR FFT L1 module can be used in a C++ HLS design by: 1. Cloning the Vitis DSP Library git repository and adding the following path to the compiler include path:

REPO_PATH/dsp/L1/include/hw/vitis_fft/fixed/
  1. Include vt_fft.hpp.
  2. Use namespace xf::dsp::fft.
  3. Define the fft parameter structure. For example, call it params_fix by extending ssr_fft_default_params like Defining 1-D SSR FFT Parameter Structure.
  4. Call fft<params_fix>(input_stream,output_stream).

The following section gives usage examples and explains some other interface level details for use in a C++ based HLS design. To use the 1-D SSR FFT L1 module:

  1. Include the vt_fft.hpp header:
#include "vt_fft.hpp"
  1. Use namespace xf::dsp::fft:
using namespace xf::dsp::fft;
  1. Define a C++ structure that extends ssr_fft_default_params:
struct params_fix:ssr_fft_default_params
{
        static const int N-SSR_FFT_L;
        static const int R=SSR_FFT_R;
        static const scaling_mode_enum
        scaling_mode=SSR_FFT_GROW_TO_MAX_WIDTH;
        static const fft_output_order_enum
        output_data_order=SSR_FFT_NATURAL;
        static const int twiddle_table_word_length=18;
        static const int twiddle_table_intger_part_length=2;
};
  1. Call 1-D SSR FFT as follows:
fft<params_fix>(inD,outD);
//OR
fft<params_fix,IID>(inD,outD);
// IID: is a constant giving instance ID

where inD and outD are 1-dimensional hls::stream of a complex of ap_fixed, float, or double type; synthesis and simulation use is already explained in the previous table. The I/O arrays can be declared as follows:

Fixed Point Type

First define the input type, then using type traits calculate the output type based on the ssr_fft_params struct (output type calculation takes in consideration the scaling mode based bit-growth and input bit-widths).

Floating Point 1-D SSR FFT Usage

The Vitis 1-D SSR FFT L1 module can be used in a C++ HLS design by:

  1. Cloning the Vitis DSP Library git repository and adding the following path to the compiler include path:

    REPO_PATH/dsp/L1/include/hw/vitis_fft/float/

  2. Include vt_fft.hpp.

  3. Use namespace xf::dsp::fft.

  4. Define the fft parameter structure. For example, call it params_float by extending ssr_fft_default_params like Defining 1-D SSR FFT Parameter Structure.

  5. Call fft<params_float>(input_stream,output_stream).

The following section gives usage examples and explains some other interface level details for use in a C++ based HLS design. To use the 1-D SSR FFT L1 module:

  1. Include the vt_fft.hpp header:
#include "vt_fft.hpp"
  1. Use namespace xf::dsp::fft:
using namespace xf::dsp::fft;
  1. Define a C++ structure that extends ssr_fft_default_params:
struct params_float:ssr_fft_default_params
{
   static const int N = 1024;
   static const int R = 4;
   static const fft_output_order_enum output_data_order = SSR_FFT_NATURAL;
   static const transform_direction_enum transform_direction = FORWARD_TRANSFORM;
};
  1. Call 1-D SSR FFT as follows:
fft<params_float>(inD,outD);
//OR
fft<ssr_fft_params,IID>(inD,outD);
// IID: is a constant giving instance ID

where inD and outD are 1-dimensional hls::stream complex of an ap_fixed, float, or double type; synthesis and simulation use is already explained in the previous table. The I/O arrays can be declared as follows:

Fixed Point Type: First define the input type, then using the type traits, calculate the output type based on ssr_fft_params struct (output type calculation takes into consideration scaling mode based bit-growth and input bit-widths not relevant for type float).

typedef std::complex< float > I_TYPE;
typedef xf::dsp::fft::ssr_fft_output_type<ssr_fft_params,I_TYPE>::t_ssr_fft_out O_TYPE;
I_TYPE inD[SSR_FFT_R][SSR_FFT_L/SSR_FFT_R];
O_TYPE outD [R][L/R];

Here SSR_FFT_R defines the SSR factor and SSR_FFT_L defines the size of the FFT transform.

Float/Double Type: First define the double/float input type, then using type traits, calculate the output type based on ssr_fft_params struct. For float types, the output type calculation will return the same type as the input.

typedef std::complex< float/double > I_TYPE;
typedef hls::ssr_fft::ssr_fft_output_type<ssr_fft_params,I_TYPE>::t_ssr_fft_out O_TYPE;
I_TYPE inD[SSR_FFT_R][SSR_FFT_L/SSR_FFT_R];
O_TYPE outD[SSR_FFT_R][SSR_FFT_L/SSR_FFT_R];

1-D SSR FFT Input Stream Reading and Writing Considerations

After synthesis, 1-D SSR FFT HLS IP maps to a processing block with a stream interface at both the input and output.

If you require a streaming 1-D SSR FFT block with a first in first out (FIFO) interface at both the input and output, as shown in the following figure, do the following:

doc tool flow
  1. Declare the input and output streams explicitly with the corresponding stream depth:
// input stream of innerFFT
hls::stream<I_TYPE, FIFO_DEPTH> inStrm[R];
// output stream of innerFFT
hls::stream<O_TYPE, FIFO_DEPTH> outStrm[R];
  1. Feed data to the input FIFO and read from the output FIFO:
for (int t = 0; t < L / R; t++)
{
        for (int r = 0; r < R; r++)
        {
    inStrm[r].write(inD[r][t]);
        }
}

for (int t = 0; t < L / R; t++)
{
        for (int r = 0; r < R; r++)
        {
    outD[r][t] = outStrm[r].read();
        }
}

3. If the 1-D SSR FFT IP is facing another HLS IP in the input chain or output chain, the inner loop doing the reading and writing should be unrolled.

1-D SSR FFT Usage in Dataflow Region Streaming

1-D SSR FFT internally relies heavily on the HLS dataflow optimization. The potential use case for 1-D SSR FFT could interconnect with an SSR FFT input or output in two ways:

Streaming Connection

The current version of the 1-D SSR FFT naturally supports a streaming connection at the output and input.

// just call it
fft<params_float>(inD, outD);
//OR
fft<ssr_fft_params, IID>(inD, outD);
// IID: is a constant giving instance ID

1-D SSR FFT Tests

Different tests are provided for fixed-point and floating-point 1-D SSR FFT. These tests can be run individually using the Makefile, or they can all be launched at the same time by using a provided script. All the 1-D SSR FFT tests are in the folder, REPO_PATH/dsp/L1/tests/hw/1dfft

Launching an Individual Test

To launch an individual test, first it is required to set up the environment for launching the Vitis HLS Compiler which can be done as follows:

Set up the environment variable TA_PATH to point to the installation path of your Vitis installation, and run the following commands to set up the environment:

export XILINX_VITIS=${TA_PATH}/Vitis/2021.1
export XILINX_VIVADO=${TA_PATH}/Vivado/2021.1
source ${XILINX_VIVADO}/settings64.sh

Once the environment settings are done, an individual test can be launched by going to the test folder (any folder inside a sub-directory at any level of REPO_PATH/dsp/L1/test/hw/1dfft that has a Makefile is a test) and running the make command: make run XPART='xcu200-fsgd2104-2-e' CSIM=1 CSYNTH=1 COSIM=1. You can choose the part number as required, and by setting CSIM/CSYNTH/COSIM=0, choose what to build and run with the make target.