The first step is to characterize the 2D IFFT AI Engine IP, that is, vss_fft_ifft_1d_graph, to understand the optimal configuration to meet our requirements.
We can instantiate vss_fft_ifft_1d_graph based on the configuration below. The main choice you have to make as part of this exercise is what TP_SSR is sufficient to meet our throughput requirement. We begin assuming that TP_SSR=1 and adjust as needed. For more information on the definition of these parameters, refer to Vitis Libraries.
typedef cint32 TT_DATA;
typedef cint16 TT_TWIDDLE;
static constexpr unsigned TP_POINT_SIZE = 4096;
static constexpr unsigned TP_FFT_NIFFT = 0;
static constexpr unsigned TP_SHIFT = 0;
static constexpr unsigned TP_CASC_LEN = 1;
static constexpr unsigned TP_API = 0;
static constexpr unsigned TP_SSR = 1;
static constexpr unsigned TP_USE_WIDGETS = 0;
static constexpr unsigned TP_RND = 12;
static constexpr unsigned TP_SAT = 1;
static constexpr unsigned TP_TWIDDLE_MODE = 0;
Note that vss_fft_ifft_1d_graph is made up of three AI Engine kernels:
Front FFT/IFFT
Point-wise twiddle multiplication
Back FFT/IFFT
In <path-to-design>/aie/ifft4096_2d_characterize/ifft4096_2d_app.cpp
, we have added a location constraint to place the first two kernels in the same tile.
location<kernel>(dut.ifft4096_2d.m_fftTwRotKernels[ff]) = location<kernel>(dut.ifft4096_2d.frontFFTGraph[ff].FFTwinproc.m_fftKernels[0]);
The next step is to characterize its performance.
[shell]% cd <path-to-design>/aie/ifft4096_2d_characterize
[shell]% make clean all
[shell]% vitis_analyzer aiesimulator_output/default.aierun_summary
Inspecting vitis_analyzer, we can read two throughput numbers: First, 4096/8.914us = 460 MSPS, corresponding to the tile performing front 64-point IFFT + point-wise twiddle multiplication. Second, 4096/7.604us = 539 MSPS, corresponding to the tile performing the back 64-point IFFT.
This means, we need SSR=5 to meet our target throughput of 2 GSPS.