It is important to note that the DFT for AIE requires that each frame of input data is aligned to 256-bits, i.e. zero-padded to be a multiple of 8 for cint16, and 4 for cint32 and cfloats. Zero-padding will have no impact on the final result of the transform.
This is also a requirement when using the cascading feature of the DFT for AIE. As mentioned, each frame of data is to be split across each of the kernels. Each cascaded kernel should receive a split of the frame that has a size that is a multiple of 8 for cint16, or 4 for cint32 and cfloats. The data should be dealt out sample-by-sample among each kernel in a round-robin fashion.
The padding requirements for AIE-ML devices are similar to that of AIE except that input data of cint32 should be zero-padded, for alignment, to be a multiple of 8 (instead of 4 for AIE). This is needed for optimal MAC performance in AIE-ML.
For example, if TP_POINT_SIZE = 20
and TP_CASC_LEN = 3
, this should be padded into a frame with a size that is a multiple of 8 * TP_CASC_LEN
for cint16, and 4 * TP_CASC_LEN
for cint32 and cfloats:
20-point data for transform:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
20-point data padded up to a frame of 24 elements:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 0 0 0
The frame is then dealt sample-by-sample to each kernel in cascade.
Kernel 1:
1 4 7 10 13 16 19 0
Kernel 2:
2 5 8 11 14 17 20 0
Kernel 3:
3 6 9 12 15 18 0 0