IFFT-2D Library Characterization - 2025.1 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2025-08-25
Version
2025.1 English

We need to characterize a single instance of the IP and measure throughput to understand how many instances we need to meet performance.

The first step is to characterize the 2D IFFT AI Engine IP, that is, vss_fft_ifft_1d_graph, to understand the optimal configuration to meet our requirements.

We can instantiate vss_fft_ifft_1d_graph based on the configuration below. The main choice you have to make as part of this exercise is what TP_SSR is sufficient to meet our throughput requirement. We begin assuming that TP_SSR=1 and adjust as needed. For more information on the definition of these parameters, refer to Vitis Libraries.

  typedef cint32            TT_DATA;
  typedef cint16            TT_TWIDDLE;
  static constexpr unsigned TP_POINT_SIZE = 4096;
  static constexpr unsigned TP_FFT_NIFFT = 0;
  static constexpr unsigned TP_SHIFT = 0;
  static constexpr unsigned TP_CASC_LEN = 1;
  static constexpr unsigned TP_API = 0;
  static constexpr unsigned TP_SSR = 1;
  static constexpr unsigned TP_USE_WIDGETS = 0;
  static constexpr unsigned TP_RND = 12;
  static constexpr unsigned TP_SAT = 1;
  static constexpr unsigned TP_TWIDDLE_MODE = 0;

Note that vss_fft_ifft_1d_graph is made up of three AI Engine kernels:

  • Front FFT/IFFT

  • Point-wise twiddle multiplication

  • Back FFT/IFFT

In <path-to-design>/aie/ifft4096_2d_characterize/ifft4096_2d_app.cpp, we have added a location constraint to place the first two kernels in the same tile.

location<kernel>(dut.ifft4096_2d.m_fftTwRotKernels[ff]) = location<kernel>(dut.ifft4096_2d.frontFFTGraph[ff].FFTwinproc.m_fftKernels[0]);

The next step is to characterize its performance.

[shell]% cd <path-to-design>/aie/ifft4096_2d_characterize
[shell]% make clean all
[shell]% vitis_analyzer aiesimulator_output/default.aierun_summary

Inspecting vitis_analyzer, we can read two throughput numbers: First, 4096/9.232us = 444 MSPS, corresponding to the tile performing front 64-point IFFT + point-wise twiddle multiplication. Second, 4096/7.604us = 537 MSPS, corresponding to the tile performing the back 64-point IFFT.

figure11

This means, we need SSR=5 to meet our target throughput of 2 GSPS.