We need to characterize a single instance of the IP and measure throughput to understand how many instances we need to meet performance.
The first step is to characterize the 2D IFFT AI Engine IP, that is, vss_fft_ifft_1d_graph, to understand the optimal configuration to meet our requirements.
We can instantiate vss_fft_ifft_1d_graph based on the configuration below. The main choice you have to make as part of this exercise is what TP_SSR is sufficient to meet our throughput requirement. We begin assuming that TP_SSR=1 and adjust as needed. For more information on the definition of these parameters, refer to Vitis Libraries.
typedef cint32 TT_DATA;
typedef cint16 TT_TWIDDLE;
static constexpr unsigned TP_POINT_SIZE = 4096;
static constexpr unsigned TP_FFT_NIFFT = 0;
static constexpr unsigned TP_SHIFT = 0;
static constexpr unsigned TP_CASC_LEN = 1;
static constexpr unsigned TP_API = 0;
static constexpr unsigned TP_SSR = 1;
static constexpr unsigned TP_USE_WIDGETS = 0;
static constexpr unsigned TP_RND = 12;
static constexpr unsigned TP_SAT = 1;
static constexpr unsigned TP_TWIDDLE_MODE = 0;
Note that vss_fft_ifft_1d_graph is made up of three AI Engine kernels:
Front FFT/IFFT
Point-wise twiddle multiplication
Back FFT/IFFT
In <path-to-design>/aie/ifft4096_2d_characterize/ifft4096_2d_app.cpp, we have added a location constraint to place the first two kernels in the same tile.
location<kernel>(dut.ifft4096_2d.m_fftTwRotKernels[ff]) = location<kernel>(dut.ifft4096_2d.frontFFTGraph[ff].FFTwinproc.m_fftKernels[0]);
The next step is to characterize its performance.
[shell]% cd <path-to-design>/aie/ifft4096_2d_characterize
[shell]% make clean all
[shell]% vitis_analyzer aiesimulator_output/default.aierun_summary
Inspecting vitis_analyzer, we can read two throughput numbers: First, 4096/9.232us = 444 MSPS, corresponding to the tile performing front 64-point IFFT + point-wise twiddle multiplication. Second, 4096/7.604us = 537 MSPS, corresponding to the tile performing the back 64-point IFFT.
This means, we need SSR=5 to meet our target throughput of 2 GSPS.