Scenario 1:
512 point forward FFT with cint16 data requires >500 MSa/sec with a window interface and minimal latency. WithTP_CASC_LEN=1
andTP_PARALLEL_POWER=0
, this is seen to achieve approx 419 Msa/sec. WithTP_CASC_LEN=2
, this increases to 590 Msa/s. The configuration will be as follows:
xf::dsp::aie::fft::dit_1ch::fft_ifft_dit_1ch_graph<cint16, cint16, 512, 1, 9, 2, 0, 512, 0, 0> myFFT;
Note
TP_SHIFT
is set to 9 for nominal 1/N scaling. TP_WINDOW_VSIZE
has been set to TP_POINT_SIZE
to minimize latency.
Scenario 2:
4096 point inverse FFT with cint32 data is required with 100 Msa/sec. This cannot be accommodated in a single kernel due to memory limits. These memory limits apply to cascaded implementations too, so the recommended configuration is as follows:
xf::dsp::aie::fft::dit_1ch::fft_ifft_dit_1ch_graph<cint32, cint16, 4096, 0, 12, 1, 0, 4096, 1, 1> myFFT;
Note
TP_SHIFT
is set to 12 for nominal 1/N scaling. TP_WINDOW_VSIZE
has been set to TP_POINT_SIZE
because to attempt any multiple of TP_POINT_SIZE
would exceed memory limits.