Constraints - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

The FFT design has large memory requirements for data buffering and twiddle storage. Constraints may be necessary to fit a design or to achieve high performance, such as ensuring FFT kernels do not share tiles with other FFT kernels or user kernels. To apply constraints you must know the instance names of the internal graph hierarchy of the FFT. See Applying Design Constraints below.

../../../_images/X25897.png

Figure 27 Applying Design Constraints

Location and other constraints may be applied in the parent graph which instances the FFT graph class. To apply a constraint, you will need to know the name of the kernel, which will include the hierarchial path to that kernel. The simplest way to derive names, including the hierarchial part, is to compile a design and open it in Vitis (vitis_analyzer), using the graph view. The names of all kernels and memory buffers can be obtained from there. These names may then be back-annotated to the parent graph to apply the necessary constraint.

The FFT graph class is implemented as a recursion of the top level to implement the parallelism. The instance names of each pair of subgraphs in the recursion are FFTsubframe(0) and FFTsubframe(1). In the final level of recursion, the FFT graph will contain an instance of either FFTwinproc (for TP_API = 0) or FFTstrproc (when TP_API=1). Within this level there is an array of kernels called m_fftKernels which will have TP_CASC_LEN members. In the above diagram, widgets are shown in green and red. The widgets either receive 2 streams and interlace these streams to form an iobuffer of data on which the FFT operates, or take the iobuffer output from the FFT and deinterlace this into 2 streams. The widgets may be expressed as separate kernels (TP_USE_WIDGETS=1), and hence then placed on separate tiles, or may be expressed as functions internal to the FFT kernel (or combiner kernel) (TP_USE_WIDGETS=0) for improved performance compared to one tile hosting both the FFT kernel and associated widgets. See also Configuration Notes below.

In release 2023.1 widget kernels which converted from dual streams to iobuffers and vice versa were blended with the parent FFT or combiner kernels they supported. This eliminated kernel-switch overheads and so improved performance versus the case where widgets were co-located with their parent FFT or combiner. However, this prevented the user from splitting each trio of kernels over multiple tiles for even higher performance albeit at a trebling of the resource cost. In 2023.2, the choice of whether to express the widgets as standalone kernels, or to blend them with the FFT or combiner they serve, has been added as TP_USE_WIDGETS. The following description applies to the configuration with widget kernels, but the principles of the recursive decomposition and the names of the FFT and FFT combiner kernels remain and apply in either case. The stream to window conversion kernels on input and output to the fft subframes are at the same level as m_fftKernels and are called m_inWidgetKernel and m_outWidgetKernel respectively. Each level of recursion will also contain an array of radix2 combiner kernels and associated stream to window conversion kernels. These are seen as a column of kernels in the above figure. Their instance names are m_r2Comb[] for the radix2 combiners and m_combInKernel[] and m_combOutKernel[] for the input and output widget kernels respectively.

Examples of constraints: For TP_PARALLEL_POWER=2, to set the runtime ratio of the 3rd of 4 subframe FFTs, the constraint could look like this:

runtime<ratio>(myFFT.FFTsubframe[1].FFTsubframe[0].FFTstrproc.m_kernels[0]) = 0.9; //where myFFT is the instance name of the FFT in your design.

For the same example, to ensure that the second radix2 combiner kernel in the first column of combiners and its input widget do not share a tile, the constraint could look like this:

not_equal(location<kernel>(myFFT.FFTsubframe[0].m_combInKernel[1]),location<kernel>( myFFT.FFTsubframe[0].m_r2Comb[1]));

For large point sizes, e.g. 65536, the design is large, requiring 80 tiles. With such a large design, the AMD Vitis™ AIE mapper may time out due to there being too many possibilities of placement, so placement constraints are recommended to reduce the solution space and so reduce the time spent by the Vitis AIE mapper tool to find a solution. Example constraints have been provided in the test.hpp file for the fft_ifft_dit_1ch, i.e in: L2/tests/aie/fft_ifft_dit_1ch/test.hpp.