Overview - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

fft_dit_1ch is a single-channel, decimation-in-time, fixed point size FFT.

This class definition is only used with stream interfaces (TP_API == 1). Stream interface FFT graph is offered with a dual input stream configuration, which interleaves data samples betwwen the streams. Stream interface FFT implementation is capable of supporting parallel computation (TP_PARALLEL_POWER > 0). Dynamic point size, with a header embedded in the data stream.

These are the templates to configure the single-channel decimation-in-time class.

Parameters:

TT_DATA

describes the type of individual data samples input to and output from the transform function. This is a typename and must be one of the following:

int16, cint16, int32, cint32, float, cfloat.

TT_TWIDDLE

describes the type of twiddle factors of the transform.

It must be one of the following: cint16, cint32, cfloat and must also satisfy the following rules:

  • 32 bit types are only supported when TT_DATA is also a 32 bit type,
  • TT_TWIDDLE must be an integer type if TT_DATA is an integer type
  • TT_TWIDDLE must be cfloat type if TT_DATA is a float type.
TP_POINT_SIZE

is an unsigned integer which describes the number of samples in the transform.

This must be 2^N where N is an integer in the range 4 to 16 inclusive.

When TP_DYN_PT_SIZE is set, TP_POINT_SIZE describes the maximum point size possible.

TP_FFT_NIFFT selects whether the transform to perform is an FFT (1) or IFFT (0).
TP_SHIFT selects the power of 2 to scale the result by prior to output.
TP_CASC_LEN selects the number of kernels the FFT will be divided over in series to improve throughput
TP_DYN_PT_SIZE

selects whether (1) or not (0) to use run-time point size determination.

When set, each window of data must be preceeded, in the window, by a 256 bit header.

This header is 8 samples when TT_DATA is cint16 and 4 samples otherwise.

The real part of the first sample indicates the forward (1) or inverse (0) transform.

The real part of the second sample indicates the Radix2 power of the point size.

e.g. for a 512 point size, this field would hold 9, as 2^9 = 512. The second least significant byte 8 bits of this field describe the Radix 2 power of the following

frame. e.g. for a 512 point size, this field would hold 9, as 2^9 = 512.

Any value below 4 or greater than log2(TP_POINT_SIZE) is considered illegal.

The output window will also be preceeded by a 256 bit vector which is a copy of the input

vector, but for the real part of the top sample, which is 0 to indicate a legal frame or 1 to

indicate an illegal frame.

When TP_PARALLEL_POWER is greater than 0, the header must be applied before each window of data

for every port of the design and will appears before each window of data on the output ports.

Note that the minimum point size of 16 applies to each lane when in parallel mode, so a configuration

of point size 256 with TP_PARALLEL_POWER = 2 will have 4 lanes each with a minimum of 16 so the minimum

legal point size here is 64.

TP_WINDOW_VSIZE

is an unsigned integer which describes the number of samples to be processed in each call

to the function. When TP_DYN_PT_SIZE is set to 1 the actual window size will be larger than TP_WINDOW_VSIZE

because the header is not included in TP_WINDOW_VSIZE.

By default, TP_WINDOW_SIZE is set to match TP_POINT_SIZE.

TP_WINDOW_SIZE may be set to be an integer multiple of the TP_POINT_SIZE, in which case multiple FFT iterations will be performed on a given input window, resulting in multiple iterations of output samples, reducing the numer of times the kernel needs to be triggered to process a given number of input data samples.

As a result, the overheads inferred during kernel triggering are reduced and overall performance is increased.

TP_API is an unsigned integer to select window (0) or stream (1) interfaces. When stream I/O is selected, one sample is taken from, or output to, a stream and the next sample from or two the next stream. Two streams mimimum are used. In this example, even samples are read from input stream[0] and odd samples from input stream[1].
TP_PARALLEL_POWER

is an unsigned integer to describe N where 2^N is the numbers of subframe processors to use, so as to achieve higher throughput.

The default is 0. With TP_PARALLEL_POWER set to 2, 4 subframe processors will be used, each of which takes 2 streams in for a total of 8 streams input and output. Sample[p] must be written to stream[p modulus q] where q is the number of streams.

TP_USE_WIDGETS is an unsigned integer to control the use of widgets for configurations which either use TP_API=1 (streaming IO) or TP_PARALLEL_POWER>0 which uses streams internally, even if not externally. The default is not to use widgets but to have the stream to window conversion performed as part of the FFT kernel or R2combiner kernel. Using widget kernels allows this conversion to be placed in a separate tile and so boost performance at the expense of more tiles being used.
TP_RND

describes the selection of rounding to be applied during the shift down stage of processing. Although, TP_RND accepts unsigned integer values descriptive macros are recommended where

  • rnd_floor = Truncate LSB, always round down (towards negative infinity).

  • rnd_ceil = Always round up (towards positive infinity).

  • rnd_sym_floor = Truncate LSB, always round towards 0.

  • rnd_sym_ceil = Always round up towards infinity.

  • rnd_pos_inf = Round halfway towards positive infinity.

  • rnd_neg_inf = Round halfway towards negative infinity.

  • rnd_sym_inf = Round halfway towards infinity (away from zero).

  • rnd_sym_zero = Round halfway towards zero (away from infinity).

  • rnd_conv_even = Round halfway towards nearest even number.

  • rnd_conv_odd = Round halfway towards nearest odd number.

    No rounding is performed on ceil or floor mode variants.

    Other modes round to the nearest integer. They differ only in how they round for values of 0.5.

    Note: Rounding modes rnd_sym_floor and rnd_sym_ceil are only supported on AIE-ML device.

TP_SAT

describes the selection of saturation to be applied during the shift down stage of processing. TP_SAT accepts unsigned integer values, where:

  • 0: none = No saturation is performed and the value is truncated on the MSB side.
  • 1: saturate = Default. Saturation rounds an n-bit signed value in the range [- ( 2^(n-1) ) : +2^(n-1) - 1 ].
  • 3: symmetric = Controls symmetric saturation. Symmetric saturation rounds an n-bit signed value in the range [- ( 2^(n-1) -1 ) : +2^(n-1) - 1 ].
TP_INDEX This parameter is for internal use regarding the recursion of the parallel power feature. It is recommended to miss this parameter from the configuration and rely instead on default values. If this parameter is set by the user, the behaviour of the library unit is undefined.
TP_ORIG_PAR_POWER This parameter is for internal use regarding the recursion of the parallel power feature. It is recommended to miss this parameter from the configuration and rely instead on default values. If this parameter is set by the user, the behaviour of the library unit is undefined.
template <
    typename TT_DATA,
    typename TT_TWIDDLE,
    unsigned int TP_POINT_SIZE,
    unsigned int TP_FFT_NIFFT = 1,
    unsigned int TP_SHIFT = 0,
    unsigned int TP_CASC_LEN = 1,
    unsigned int TP_DYN_PT_SIZE = 0,
    unsigned int TP_WINDOW_VSIZE = TP_POINT_SIZE,
    unsigned int TP_API = 0,
    unsigned int TP_PARALLEL_POWER = 0,
    unsigned int TP_USE_WIDGETS = 0,
    unsigned int TP_RND = 4,
    unsigned int TP_SAT = 1,
    unsigned int TP_INDEX = 0,
    unsigned int TP_ORIG_PAR_POWER = TP_PARALLEL_POWER
    >
class fft_ifft_dit_1ch_graph: public graph

// fields

static constexpr int kParallel_factor
static constexpr int kWindowSize
static constexpr int kNextParallelPower
static constexpr int kR2Shift
static constexpr int kFFTsubShift
static constexpr int kHeaderBytes
static constexpr int kStreamsPerTile
static constexpr int kPortsPerTile
static constexpr int kOutputPorts
port_array <input, kPortsPerTile*kParallel_factor> in
port_array <output, kOutputPorts> out
parameter r2comb_tw_lut
kernel m_combInKernel[kParallel_factor]
kernel m_r2Comb[kParallel_factor]
kernel m_combOutKernel[kParallel_factor]
fft_ifft_dit_1ch_graph <TT_DATA, TT_TWIDDLE, (TP_POINT_SIZE>> 1), TP_FFT_NIFFT, kFFTsubShift, TP_CASC_LEN, TP_DYN_PT_SIZE, (TP_WINDOW_VSIZE>> 1), TP_API, kNextParallelPower, TP_USE_WIDGETS, TP_RND, TP_SAT, TP_INDEX, TP_ORIG_PAR_POWER> FFTsubframe0
fft_ifft_dit_1ch_graph <TT_DATA, TT_TWIDDLE, (TP_POINT_SIZE>> 1), TP_FFT_NIFFT, kFFTsubShift, TP_CASC_LEN, TP_DYN_PT_SIZE, (TP_WINDOW_VSIZE>> 1), TP_API, kNextParallelPower, TP_USE_WIDGETS, TP_RND, TP_SAT, TP_INDEX+kParallel_factor/2, TP_ORIG_PAR_POWER> FFTsubframe1