The FFT core optionally accepts data in IEEE-754 single-precision format with 32-bit words consisting of a 1-bit sign, 8-bit exponent, and 23-bit fraction. The construction of the word matches that of the AMD Floating-Point Operator core.
Implementing full floating-point on an FPGA can be expensive in terms of the resources required. The pseudo floating-point option in the FFT core uses a higher precision fixed-point FFT internally to achieve similar noise performance to a full floating-point FFT, with significantly fewer resources. The following figure illustrates the two levels of noise performance possible by selecting either 24 bits or 25 bits for the phase factor width. By increasing the phase factor width to 25 bits, more resources might be required, depending on the target device.
The native floating-point option in the FFT core uses the DSPFP32 primitives to perform floating-point operations. The phase factor width is 32 bits by default.
The following figure shows the ratio of the RMS difference between various models and the double-precision MATLAB® FFT to the data set peak amplitude. The models shown are the single-precision MATLAB® FFT function (calculated by casting the input data to single-precision floating-point type), the FFT core using a 24-bit phase factor width, and the FFT core using a 25-bit phase factor width. To calculate the error signal, a randomized impulse (in magnitude and time) was used as the input signal, with the RMS error averaged over five simulation runs.
When comparing results against third party models, for example, MATLAB, it should be noted that a scaling factor is usually required to ensure that the results match. The scaling factor is data-dependent because the input data dictates the level of normalization required prior to the internal fixed-point core. Because the core does not provide this scaling factor in floating-point modes, you can apply scaling after the output of the core, if necessary.
All optimization options (memory types and DSP slice optimization) remain available when pseudo floating-point option is selected, allowing you to trade off resources with transform time.
Transform time for Burst I/O architectures is increased by approximately N, the number of points in the transform, due to the input normalization requirements. For the Pipelined Streaming I/O architecture, the initial latency to fill the pipeline is increased, but data still streams through the core with no gaps.