fft_dit_1ch is a single-channel, decimation-in-time, fixed point size FFT.
This class definition is only used with stream interfaces (TP_API == 1). Stream interface FFT graph is offered with a dual input stream configuration, which interleaves data samples betwwen the streams. Stream interface FFT implementation is capable of supporting parallel computation (TP_PARALLEL_POWER > 0). Dynamic point size, with a header embedded in the data stream.
These are the templates to configure the single-channel decimation-in-time class.
Parameters:
TT_DATA | describes the type of individual data samples input to and output from the transform function. This is a typename and must be one of the following: int16, cint16, int32, cint32, float, cfloat. |
TT_TWIDDLE | describes the type of twiddle factors of the transform. It must be one of the following: cint16, cint32, cfloat and must also satisfy the following rules:
|
TP_POINT_SIZE | is an unsigned integer which describes the number of samples in the transform. This must be 2^N where N is an integer in the range 4 to 16 inclusive. When TP_DYN_PT_SIZE is set, TP_POINT_SIZE describes the maximum point size possible. |
TP_FFT_NIFFT | selects whether the transform to perform is an FFT (1) or IFFT (0). |
TP_SHIFT | selects the power of 2 to scale the result by prior to output. |
TP_CASC_LEN | selects the number of kernels the FFT will be divided over in series to improve throughput |
TP_DYN_PT_SIZE | selects whether (1) or not (0) to use run-time point size determination. When set, each window of data must be preceeded, in the window, by a 256 bit header. This header is 8 samples when TT_DATA is cint16 and 4 samples otherwise. The real part of the first sample indicates the forward (1) or inverse (0) transform. The real part of the second sample indicates the Radix2 power of the point size. e.g. for a 512 point size, this field would hold 9, as 2^9 = 512. The second least significant byte 8 bits of this field describe the Radix 2 power of the following frame. e.g. for a 512 point size, this field would hold 9, as 2^9 = 512. Any value below 4 or greater than log2(TP_POINT_SIZE) is considered illegal. The output window will also be preceeded by a 256 bit vector which is a copy of the input vector, but for the real part of the top sample, which is 0 to indicate a legal frame or 1 to indicate an illegal frame. When TP_PARALLEL_POWER is greater than 0, the header must be applied before each window of data for every port of the design and will appears before each window of data on the output ports. Note that the minimum point size of 16 applies to each lane when in parallel mode, so a configuration of point size 256 with TP_PARALLEL_POWER = 2 will have 4 lanes each with a minimum of 16 so the minimum legal point size here is 64. |
TP_WINDOW_VSIZE | is an unsigned integer which describes the number of samples to be processed in each call to the function. When TP_DYN_PT_SIZE is set to 1 the actual window size will be larger than TP_WINDOW_VSIZE because the header is not included in TP_WINDOW_VSIZE. By default, TP_WINDOW_SIZE is set to match TP_POINT_SIZE. TP_WINDOW_SIZE may be set to be an integer multiple of the TP_POINT_SIZE, in which case multiple FFT iterations will be performed on a given input window, resulting in multiple iterations of output samples, reducing the numer of times the kernel needs to be triggered to process a given number of input data samples. As a result, the overheads inferred during kernel triggering are reduced and overall performance is increased. |
TP_API | is an unsigned integer to select window (0) or stream (1) interfaces. When stream I/O is selected, one sample is taken from, or output to, a stream and the next sample from or two the next stream. Two streams mimimum are used. In this example, even samples are read from input stream[0] and odd samples from input stream[1]. |
TP_PARALLEL_POWER | is an unsigned integer to describe N where 2^N is the numbers of subframe processors to use, so as to achieve higher throughput. The default is 0. With TP_PARALLEL_POWER set to 2, 4 subframe processors will be used, each of which takes 2 streams in for a total of 8 streams input and output. Sample[p] must be written to stream[p modulus q] where q is the number of streams. |
TP_USE_WIDGETS | is an unsigned integer to control the use of widgets for configurations which either use TP_API=1 (streaming IO) or TP_PARALLEL_POWER>0 which uses streams internally, even if not externally. The default is not to use widgets but to have the stream to window conversion performed as part of the FFT kernel or R2combiner kernel. Using widget kernels allows this conversion to be placed in a separate tile and so boost performance at the expense of more tiles being used. |
TP_RND | describes the selection of rounding to be applied during the shift down stage of processing. Although, TP_RND accepts unsigned integer values descriptive macros are recommended where
|
TP_SAT | describes the selection of saturation to be applied during the shift down stage of processing. TP_SAT accepts unsigned integer values, where:
|
TP_INDEX | This parameter is for internal use regarding the recursion of the parallel power feature. It is recommended to miss this parameter from the configuration and rely instead on default values. If this parameter is set by the user, the behaviour of the library unit is undefined. |
TP_ORIG_PAR_POWER | This parameter is for internal use regarding the recursion of the parallel power feature. It is recommended to miss this parameter from the configuration and rely instead on default values. If this parameter is set by the user, the behaviour of the library unit is undefined. |
template < typename TT_DATA, typename TT_TWIDDLE, unsigned int TP_POINT_SIZE, unsigned int TP_FFT_NIFFT = 1, unsigned int TP_SHIFT = 0, unsigned int TP_CASC_LEN = 1, unsigned int TP_DYN_PT_SIZE = 0, unsigned int TP_WINDOW_VSIZE = TP_POINT_SIZE, unsigned int TP_API = 0, unsigned int TP_PARALLEL_POWER = 0, unsigned int TP_USE_WIDGETS = 0, unsigned int TP_RND = 4, unsigned int TP_SAT = 1, unsigned int TP_INDEX = 0, unsigned int TP_ORIG_PAR_POWER = TP_PARALLEL_POWER > class fft_ifft_dit_1ch_graph: public graph // fields static constexpr int kParallel_factor static constexpr int kWindowSize static constexpr int kNextParallelPower static constexpr int kR2Shift static constexpr int kFFTsubShift static constexpr int kHeaderBytes static constexpr int kStreamsPerTile static constexpr int kPortsPerTile static constexpr int kOutputPorts port_array <input, kPortsPerTile*kParallel_factor> in port_array <output, kOutputPorts> out parameter r2comb_tw_lut kernel m_combInKernel[kParallel_factor] kernel m_r2Comb[kParallel_factor] kernel m_combOutKernel[kParallel_factor] fft_ifft_dit_1ch_graph <TT_DATA, TT_TWIDDLE, (TP_POINT_SIZE>> 1), TP_FFT_NIFFT, kFFTsubShift, TP_CASC_LEN, TP_DYN_PT_SIZE, (TP_WINDOW_VSIZE>> 1), TP_API, kNextParallelPower, TP_USE_WIDGETS, TP_RND, TP_SAT, TP_INDEX, TP_ORIG_PAR_POWER> FFTsubframe0 fft_ifft_dit_1ch_graph <TT_DATA, TT_TWIDDLE, (TP_POINT_SIZE>> 1), TP_FFT_NIFFT, kFFTsubShift, TP_CASC_LEN, TP_DYN_PT_SIZE, (TP_WINDOW_VSIZE>> 1), TP_API, kNextParallelPower, TP_USE_WIDGETS, TP_RND, TP_SAT, TP_INDEX+kParallel_factor/2, TP_ORIG_PAR_POWER> FFTsubframe1