Scenario 1: 512 point forward FFT with cint16 data requres >500 Msamples/sec with a window interface and minimal latency. With TP_CASC_LEN=1 and TP_PARALLEL_POWER=0 this is seen to achieve approx 419Msa/sec. With TP_CASC_LEN=2 this increases to 590Msa/s. The configuration will be as follows: xf::dsp::aie::fft::dit_1ch::fft_ifft_dit_1ch_graph<cint16, cint16, 512, 1, 9, 2, 0, 512, 0, 0> myFFT; Notes: TP_SHIFT is set to 9 for nominal 1/N scaling. TP_WINDOW_VSIZE has been set to TP_POINT_SIZE to minimize latency.
Scenario 2: 4096 point inverse FFT with cint32 data is required with 100Msa/sec. This cannot be accommodated in a single kernel due to memory limits. These memory limits apply to cascaded implementations too, so the recommended configuration is as follows: xf::dsp::aie::fft::dit_1ch::fft_ifft_dit_1ch_graph<cint32, cint16, 4096, 0, 12, 1, 0, 4096, 1, 1> myFFT; Notes: TP_SHIFT is set to 12 for nominal 1/N scaling. TP_WINDOW_VSIZE has been set to TP_POINT_SIZE as to attempt any multiple of TP_POINT_SIZE would exceed memory limits.