Kernel Header File - 2024.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English

The following step is to write the kernel’s header file. Inside such file the aie_api/aie.hpp and the adf.h API header files are included along with the one containing the twiddles. Then, the fft1k_kernel class is created. It has the following public attributes:

static constexpr unsigned N = POINTS;
static constexpr unsigned SHIFT_TW = 15;
static constexpr unsigned SHIFT_DT = 15;
static constexpr bool     INVERSE  = false;
static constexpr unsigned REPEAT   = REPS;
static constexpr unsigned BUF_SIZE = N * REPEAT;

Those attributes define the number of points (N), the shifts to be applied between stages for twiddles and data (SHIFT_DT and SHIFT_TW), and the IFFT/FFT flag to be passed to the API calls (INVERSE). Moreover, they define the number of FFT instances to batch together in one kernel (REPEAT), and the buffer size (BUF_SIZE).

As private attributes, the kernel class has the twiddle tables, defined as static constexpr CINT16 arrays aligned to the AIE-ML core vector registers lane boundaries. The alignment is important to avoid cycle waste when loading the values into the vector processor’s registers, whereas the static constexpr directive is used to instruct the compiler that the declared variable is a read-only constant (static) that has to be available at compile-time as well as runtime for optimization. The twiddle array size has a trend that is opposite to the vectorization one. Therefore, in a 1024 points radix-4 implementation, the first stage will have a vectorization equal to 256, and each of the twiddle entries will have just one element. Those trends are clearly visible in the butterfly diagram of the 3-stages, radix-2, 8-points example shown in figure 3, where the index stride between summed factors decreases through the stages, while the number of used different twiddle factors increases.

...
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw0_0[1]	= TWID0_0;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw0_1[1]	= TWID0_1;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw0_2[1]	= TWID0_2;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw1_0[4]	= TWID1_0;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw1_1[4]	= TWID1_1;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw1_2[4]	= TWID1_2;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw2_0[16]	= TWID2_0;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw2_1[16]	= TWID2_1;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw2_2[16]	= TWID2_2;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw3_0[64]	= TWID3_0;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw3_1[64]	= TWID3_1;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw3_2[64]	= TWID3_2;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw4_0[256]	= TWID4_0;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw4_1[256]	= TWID4_1;
    alignas(aie::vector_decl_align) static constexpr TT_TWID	tw4_2[256]	= TWID4_2;
...

Note that the CINT16 twiddle datatype is parametrized through the TT_TWID typedef, and that the vector’s subscripts indicate in order the FFT stage and the stage entry, in compliance with the convention used in the twiddles header file. Moreover, also a temporary buffer with size equal to the number of points must be declared if the number of stages is even.

The last section of the class is reserved to declare the public methods, those are:

  • the class constructor;

  • the run method, that will be the actual kernel code;

  • the registerKernelClass method that calls the REGISTER_FUNCTION macro, that registers the run method to be used on the AIE-ML tile core to perform the kernel function.

The run method has for agruments the input and output buffers data structures, that are declared with the __restrict pointer qualifier, that enable aggressive compiler optimizations by explicitly stating that there will not be pointer aliasing, i.e., the pointed memory space will be accessed only by such pointer.

// Class constructor
fft1k_kernel(void);

// Run function
void run(input_buffer<TT_DATA,extents<BUF_SIZE> >& __restrict din,
         output_buffer<TT_DATA,extents<BUF_SIZE> >& __restrict dout );

// Register macro
static void registerKernelClass(void){
    REGISTER_FUNCTION(fft1k_kernel::run);
}