Design Strategy - 2024.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English

The considered case study requires the computation of the FFT of 128 concurrent signals. To do so, an efficient strategy is to create a basic FFT computing block and replicate it in the top graph to run more FFT calculations in parallel, matching the required throughput. Moreover, to optimize the AI Engine resources, it is beneficial to maximize the local memory usage and to serialize the data and the computation, as this decreases the interface and compute tiles utilization. For how the FFT algorithm works, the buffering of at least half the samples of each signal is required. To avoid using programmable logic memory resources, the chosen strategy is to perform such buffering inside the AIE-ML using the memory tiles.

Fig. 1: Preliminary Data Flow Block Diagram.

The resulting system follows the diagram shown in figure 1, where 128 instances are acquired in parallel, then they are routed from the programmable logic to the AIE-ML though a certain number N of interface tile I/O channels. The samples are then routed to a certain number K of kernels to compute the FFTs in parallel, and their output is eventually routed back to PL.

Design requirement Strategy
Compute the FFT of 128 distinct signals
  • Create one compute block (kernel) and replicate it to parallelize computation
Optimize AI Engine resources
  • Minimize AIE-ML tile utilization: maximize local memory usage and serialize the computation
  • Minimize Interface tile utilization: serialize the data
Optimize Programmable Logic resources
  • Perform the FFT required buffering inside the AIE-ML using the memory tiles

Table 1: Preliminary Design Strategy Summary.