Power-of-B algorithms (also known as radix-B algorithms) are the most known application of Cooley-Tukey’s results. These algorithms apply when the signal has a prime power of a number B: \(N=B^p\) with \(B,p\in\mathbb{N}\). In such cases, you can apply the algorithm recursively. The DFTs from the Cooley-Tukey decimation equations (shown above) can be decimated if \(\frac{N_1}{N_2} \in \mathbb{N}\). When the decimation order is \(N_2=B\), the DFT can be decimated up to \(p\) times.
After applying this algebraic approach, the resulting operations are small DFTs with \(B\) points that compute faster. This achieves the \(O(N\cdot log_B(N))\) computational complexity that Cooley and Tukey highlighted.
The butterfly diagram provides a well-known representation of these algorithms. The nodes represent operations that occur during each algorithm step. The edges connect the samples and temporary results to these nodes or to the final results.
Fig. 1: Butterfly diagram of an 8-point, radix-2 DIT FFT
These diagrams help you visualize the algorithm steps, data dependencies, pipeline, and data flow during implementation. By observing the indexes of the inputs and outputs of the decimation in time and decimation in the frequency butterfly diagrams shown above, you can see that this is a not self-sorting algorithm because the indexes at both inputs and outputs use bit-reversed order.
Higher order radix algorithms benefit implementations on vector machines with multiply-accumulate functional units. The performance increase stems from several factors. First, the algorithm requires fewer operations because it has fewer stages. This also means fewer load and store operations, reducing data movement through interconnects. MAC unit equipped vector computers compute a DFT in fewer clock cycles, making this particularly important. Higher order decimations also increase trivial complex rotation operations (multiplication by 1,–1, \(j\), and \(-j\)) because the twiddle factors rotate more times in the complex plane.
The mixed-radix algorithms represent another category from the radix-B class. These algorithms apply to signals sampled with composite prime power number of points \(N=\prod_iB_i^{(p_i)}\). For example, when you can group the prime factors of the number of points with different powers: \(2^{11}=2^1\cdot4^5\) or \(2^{10}=2^2\cdot4^4\).
You first perform the desired decimations of one order, then perform the other decimations on the resulting FFTs. This family of algorithms usually has complex indexing, making them harder to implement in hardware. This does not apply to every mixed-radix algorithm type because the Stockham variant of the FFT algorithm, which the AI Engine APIs use, has little to no indexing overhead for mixed-radixes.