Stockham FFT Algorithm - Stockham FFT Algorithm - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

Adapting the Cooley-Tuckey algorithm to vector computers with local memory (such as the AI Engine) requires three key concepts: in-place computation, constant geometry, and self-sorting.

  • In-place computation: The memory space occupied by the data remains constant throughout all algorithm steps. There is no buffer memory overhead for data computation.

  • Constant geometry: The data indexes remain unchanged from stage to stage. This increases parallelization for SIMD machines by avoiding index book-keeping.

  • Self-sorting: The output data address is not bit-reversed, meaning the data appears in order. This can spare computation because you do not need a bit-reverse stride permutation matrix to order the data.

Research has not yet found an FFT algorithm that possesses all three properties.

The AI Engine APIs employ the Stockham FFT variant as a solution. This variant sacrifices in-place computation to gain self-sorting.

Butterfly diagram of an 8-point, radix-2 Stockham DIT FFT

Fig. 2: Butterfly diagram of an 8-point, radix-2 Stockham DIT FFT

The preceding figure shows how re-indexing the stage’s computational nodes creates the Stockham variant of the FFT by adding a given increasing offset to the indexes. The following example figure shows this more explicitly.

Butterfly diagram of an 8-point, radix-2 Stockham DIF FFT with explicit re-indexing

Fig. 3: Butterfly diagram of an 8-point, radix-2 Stockham DIF FFT with explicit re-indexing

The Stockham variant sacrifices in-place computation to gain self-sorting and efficient vectorization.

These features make this algorithm suitable for performing the FFT on SIMD machines such as the AI Engine ML.