The considered case study requires the computation of the FFT of 128 concurrent signals. To do so, an efficient strategy is to create a basic FFT computing block and replicate it in the top graph to run more FFT calculations in parallel, matching the required throughput. Moreover, to optimize the AI Engine resources, it is beneficial to maximize the local memory usage and to serialize the data and the computation, as this decreases the interface and compute tiles utilization. For how the FFT algorithm works, the buffering of at least half the samples of each signal is required. To avoid using programmable logic memory resources, the chosen strategy is to perform such buffering inside the AIE-ML using the memory tiles.
Fig. 1: Preliminary Data Flow Block Diagram.
The resulting system follows the diagram shown in figure 1, where 128 instances are acquired in parallel, then they are routed from the programmable logic to the AIE-ML though a certain number N of interface tile I/O channels. The samples are then routed to a certain number K of kernels to compute the FFTs in parallel, and their output is eventually routed back to PL.
Design requirement | Strategy |
---|---|
Compute the FFT of 128 distinct signals |
|
Optimize AI Engine resources |
|
Optimize Programmable Logic resources |
|
Table 1: Preliminary Design Strategy Summary.