This section showcases several FFT designs for AMD Versal™ AI Engine, starting with a singlecore design implemented using the AI Engine API. After establishing the single-core throughput and latency baseline using AMD Vitis™ AI Engine SW simulation tools, it presents several FFT design optimization techniques to improve these benchmarks. These techniques are used within the Vitis DSP library to yield high-performance, scalable FFT IP spanning single-core to multicore designs.