IFFT-2D Library Optimization - 2024.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English

While an SSR=5 should be sufficient from a resource count perspective, using a SSR that is a power of 2 simplifies the overall design and allows thr direct mapping of TDM FIR outputs into 2D IFFT input. For this reason, we proceed with SSR=8. We can also apply the single_buffer constraint on the input and output buffer to reduce the storage requirements at the expense of some degradation in throughput.

[shell]% cd <path-to-design>/aie/ifft4096_2d
[shell]% make clean all
[shell]% vitis_analyzer aiesimulator_output/default.aierun_summary

Inspecting vitis_analyzer, we observe a resource count of 16 AIE-ML tiles. Achieved throughput for:

  • Front 64-point IFFT + point-wise twiddle multiplication = 2295 MSPS

  • Back 64-pint IFFT = 2288 MSPS

figure12

For the transpose blocks, you can implement these in the PL using block RAMs. The PL is assumed to be clocked at 312.5 MHz. A single block RAM stores up to 36 Kbits, and can be configured as 512 x 72 bits with one write and one read port. For more information on block RAMS, refer to Versal ACAP Memory Resources Architecture Manual. A single transform contains 4096 samples, 64 bits each. To achieve our desired throughput, we require two write and two read ports. The IFFT transpose blocks that exist in ${DSPLIB_ROOT}/L1/src/hw are implemented using ping-pong buffers for storage.

Given this, we expect our BRAM count per transpose block to be = 4096 / 512 x 2 x 2 = 32.