NOTE: The maximum TP_POINT_SIZE
that can be used depends on the data type, the number of kernels in cascade, and the available data memory per kernel. Each frame of data in the iobuffer should be zero-padded for alignment.
The DFT on AIE supports values of TP_POINT_SIZE
from 4 to 88 (4 to 60 for cfloat DATA_TYPE) for a single kernel.
On AIE-ML, the larger data memory per kernel allows for TP_POINT_SIZE
support from 4 up to 120.
This can be exceeded by using a number of kernels in cascade via the template parameter TP_CASC_LEN
. The memory required for each frame of input and output data, and coefficient matrix will be divided across the kernels in cascade.
For example, a TP_POINT_SIZE
of 128 can be achieved using a TP_CASC_LEN
of 2, as each kernel will only require a half the size of the input buffer that equivalent single kernel configuration would require, as well as half the memory needed for twiddle table.
The DFT has optimal throughput performance with a low point size, and a higher number of frames per iobuffer.