The Stockham FFT algorithm - 2025.1 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2025-08-25
Version
2025.1 English

To adapt the Cooley-Tuckey algorithm to vector computers that are equipped with local memory, such as the AI Engine, the concepts of in-place computation, constant geometry and self-sorting are introduced.

  • In-place computation: an algorithm has the property of being computed in-place if the memory space occupied by the data is the same along all the algorithm steps, i.e. there is no buffer memory overhead for the data computation.

  • Constant geometry: an FFT algorithm has the constant geometry property if the indexes of the data are unchanged stage by stage, increasing the parallelization for SIMD machines thanks to the avoidance of indexes book-keeping.

  • Self-sorting: an FFT algorithm is self-sorting if the output data stored output data address is not bit-reversed, i.e. the output data is ordered. This may spare quite some computation since there would be no need for a bit-reverse stride permutation matrix to be used to order the data.

Unfortunately, research has not found an FFT algorithm that possesses all those properties yet. Among the possible solutions, AI Engine API employs the Stockham variant of the FFT algorithm. Such variant sacrifies the in-place computation property to gain the self-sorting one.

Butterfly diagram of an 8-point, radix-2 Stockham DIT FFT

Fig. 2: Butterfly diagram of an 8-point, radix-2 Stockham DIT FFT

As observable from the figure above, the Stockham variant of the FFT is obtained through re-indexing the stage’s computational nodes, done by adding a given increasing offset to the indexes, as explicited in the example figure here below.

Butterfly diagram of an 8-point, radix-2 Stockham DIF FFT with explicit re-indexing

Fig. 3: Butterfly diagram of an 8-point, radix-2 Stockham DIF FFT with explicit re-indexing

In conclusion, the Stockham variant of the FFT sacrifices the in-place computation property not only to acquire the self-sorting one, but also to be efficiently vectorizable.

Such features make this algorithm to be a suitable choice for performing the FFT on SIMD machines as the AI Engine ML