While an SSR=5 should be sufficient from a resource count perspective, using a SSR that is a power of 2 simplifies the overall design and allows thr direct mapping of TDM FIR outputs into 2D IFFT input.
For this reason, we proceed with SSR=8. We can also apply the single_buffer
constraint on the input and output buffer to reduce the storage requirements at the expense of some degradation in throughput.
[shell]% cd <path-to-design>/aie/ifft4096_2d
[shell]% make clean all
[shell]% vitis_analyzer aiesimulator_output/default.aierun_summary
Inspecting vitis_analyzer, we observe a resource count of 16 AIE-ML tiles. Achieved throughput for:
Front 64-point IFFT + point-wise twiddle multiplication = 2295 MSPS
Back 64-pint IFFT = 2288 MSPS
For the transpose blocks, you can implement these in the PL using block RAMs. The PL is assumed to be clocked at 312.5 MHz.
A single block RAM stores up to 36 Kbits, and can be configured as 512 x 72 bits with one write and one read port.
For more information on block RAMS, refer to Versal ACAP Memory Resources Architecture Manual.
A single transform contains 4096 samples, 64 bits each. To achieve our desired throughput, we require two write and two read ports.
The IFFT transpose blocks that exist in ${DSPLIB_ROOT}/L1/src/hw
are implemented using ping-pong buffers for storage.
Given this, we expect our BRAM count per transpose block to be = 4096 / 512 x 2 x 2 = 32.