template class xf::dsp::aie::fir::interpolate_hb::fir_interpolate_hb_graph - 2024.2 English

Vitis Libraries

Release Date
2024-11-29
Version
2024.2 English
#include "fir_interpolate_hb_graph.hpp"

Overview

fir_interpolate_hb is a Halfband Interpolation FIR filter

These are the templates to configure the halfband interpolator FIR class.

Parameters:

TT_DATA

describes the type of individual data samples input to and output from the filter function.

This is a typename and must be one of the following:

int16, cint16, int32, cint32, float, cfloat.

TT_COEFF

describes the type of individual coefficients of the filter taps.

It must be one of the same set of types listed for TT_DATA and must also satisfy the following rules:

  • Complex types are only supported when TT_DATA is also complex.
  • TT_COEFF must be an integer type if TT_DATA is an integer type
  • TT_COEFF must be a float type if TT_DATA is a float type.
TP_FIR_LEN

is an unsigned integer which describes the number of taps in the filter.

TP_FIR_LEN must satisfy (TP_FIR_LEN +1)/4 = N where N is a positive integer.

TP_SHIFT

describes power of 2 shift down applied to the accumulation of FIR terms before output.

TP_SHIFT must be in the range 0 to 59 (61 for AIE1).

TP_RND

describes the selection of rounding to be applied during the shift down stage of processing.

Although, TP_RND accepts unsigned integer values descriptive macros are recommended where

  • rnd_floor = Truncate LSB, always round down (towards negative infinity).

  • rnd_ceil = Always round up (towards positive infinity).

  • rnd_sym_floor = Truncate LSB, always round towards 0.

  • rnd_sym_ceil = Always round up towards infinity.

  • rnd_pos_inf = Round halfway towards positive infinity.

  • rnd_neg_inf = Round halfway towards negative infinity.

  • rnd_sym_inf = Round halfway towards infinity (away from zero).

  • rnd_sym_zero = Round halfway towards zero (away from infinity).

  • rnd_conv_even = Round halfway towards nearest even number.

  • rnd_conv_odd = Round halfway towards nearest odd number.

    No rounding is performed on ceil or floor mode variants.

    Other modes round to the nearest integer. They differ only in how they round for values of 0.5.

    Note: Rounding modes rnd_sym_floor and rnd_sym_ceil are only supported on AIE-ML device.

TP_INPUT_WINDOW_VSIZE

describes the number of samples processed by the graph in a single iteration run.

When TP_API is set to 0, samples are buffered and stored in a ping-pong window buffer mapped onto Memory *Group banks.

As a result, maximum number of samples processed by the graph is limited by the size of Memory Group.

When TP_API is set to 1 and TP_SSR is set to 1, incoming samples are buffered in a similar manner.

When TP_API is set to 1 and TP_SSR > 1, samples are processed directly from the stream inputs and no *buffering takes place.

In such case, maximum number of samples processed by the graph is limited to 32-bit value (4.294B samples per *iteration).

Note: For SSR configurations (TP_SSR>1), the input data must be split over multiple ports, where each successive sample is sent to a different input port in a round-robin fashion.

As a result, each SSR input path will process a fraction of the frame defined by the TP_INPUT_WINDOW_VSIZE.

The number of values in the output window will be TP_INPUT_WINDOW_VSIZE multiplied by 2 by virtue the halfband interpolation factor.

Note: Margin size should not be included in TP_INPUT_WINDOW_VSIZE.

TP_CASC_LEN

describes the number of AIE processors to split the operation over.

This allows resource to be traded for higher performance. TP_CASC_LEN must be in the range 1 (default) to 40.

TP_DUAL_IP

allows 2 input ports to be connected to FIR, increasing available throughput.

Depending on TP_API, additional input ports functionality differs. If TP_API is set to use windows, then

TP_DUAL_IP is an implementation trade-off between performance and data bank resource.

When TP_DUAL_IP is set to 0, the FIR performance may be limited by load contention.

When TP_DUAL_IP is set to 1, two ram banks are used for input.

If TP_API is set to use streams, then:

When TP_DUAL_IP is set to 0, single stream will be connected as FIRs input.

When TP_DUAL_IP is set to 1, two stream inputs will be connected.

In such case data should be organized in 128-bit interleaved pattern, e.g.:

  • samples 0-3 to be sent over stream0 for cint16 data type,
  • samples 4-7 to be sent over stream1 for cint16 data type.
TP_USE_COEFF_RELOAD

allows the user to select if runtime coefficient reloading should be used.

When defining the parameter:

  • 0 = static coefficients, defined in filter constructor,

  • 1 = reloadable coefficients, passed as argument to runtime function.

    Note: when used, async port: port_conditional_array<input, (TP_USE_COEFF_RELOAD == 1), TP_SSR> coeff; will be added to the FIR.

    Note: the size of the port array is equal to the total number of output paths (TP_SSR).

    Each port should contain the same taps array content, i.e. each additional port must be a duplicate of the coefficient array.

    Note: when TP_USE_COEFF_RELOAD = 1 and TP_PARA_INTERP_POLY = 2, optional port: *port_conditional_array<input, (TP_USE_COEFF_RELOAD == 1), TP_SSR> coeffCT; will be added to the FIR.

TP_NUM_OUTPUTS

sets the number of ports over which the output is sent.

This can be 1 or 2. It is set to 1 by default.

Depending on TP_API, additional output ports functionality differs. For Windows API, additional output provides flexibility in connecting FIR output with multiple destinations. Additional output out2 is an exact copy of the data of the output port out .

With Stream API, the additional output port increases the FIR’s throughput.

Data is sent in a 128-bit interleaved pattern, e.g. :

  • samples 0-3 is sent over stream0 for cint16 data type,

  • samples 4-7 is sent over stream1 for cint16 data type.

    Note: when used, optional port: port<output> out2; will be added to the FIR.

TP_UPSHIFT_CT

upshift unit center tap.

When TP_UPSHIFT_CT is set to 0, center tap coefficient will be treated as any other coefficient.

When TP_UPSHIFT_CT is set to 1, provided center tap’s value will be used to upshift data sample.

Note: when complex coefficient’s are used, center tap’s real part will be used for the upshift.

Note: Upshift UCT is only supported with 16-bit coefficient types, i.e. int16 and cint16.

Note: When Upshift is enabled, center tap value must be in the range 0 to 47.

TP_API

specifies if the input/output interface should be window-based or stream-based.

The values supported are 0 (window API) or 1 (stream API).

TP_SSR

specifies the number of parallel input/output paths where samples are interleaved between paths, giving an overall higher throughput.

A TP_SSR of 1 means just one output leg and 1 input phase, and is the backwards compatible option.

The number of AIEs used is given by TP_SSR^2 * TP_CASC_LEN .

TP_PARA_INTERP_POLY

sets the number of interpolator polyphases over which the coefficients will be split to *enable parallel computation of the outputs. The polyphases are executed in parallel, output data is produced by each polyphase directly.

TP_PARA_INTERP_POLY does not affect the number of input data paths. There will be TP_SSR input phases irrespective of the value of TP_PARA_INTERP_POLY. Currently, only TP_PARA_INTERP_POLY=2 is supported for the halfband interpolators with SSR>1. SSR = 1 *supports TP_PARA_INTERP_POLY=1 or 2. TP_PARA_INTERP_POLY = 2 results in decomposing the filter’s operation into two polyphases.

Input data is broadcast to the two polyphases and each polyphase produces half of the total output data. *Their output data can be interleaved to produce a single output stream.

The first polyphase is implemented using a single rate asymmetric filter that is configured to produce and consume data in parallel in TP_SSR phases, each phase can operate at maximum throughput depending on the configuration. The first polyphase uses TP_SSR ^ 2 TP_CASC_LEN kernels.

The second polyphase simplifies into a single kernel that does a single tap because halfband interpolators *only have one non-zero coefficient in the second coefficient phase. The second polyphase uses SSR kernels operating at maximum *throughput. The overall theoretical output data rate is TP_SSR * TP_PARA_INTERP_POLY * TP_NUM_OUTPUTS * 1 GSa/s. The overall theoretical input data rate is TP_SSR * (TP_DUAL_IP + 1) * 1GSa/s

TP_SAT

describes the selection of saturation to be applied during the shift down stage of processing.

TP_SAT accepts unsigned integer values, where:

  • 0: none = No saturation is performed and the value is truncated on the MSB side.
  • 1: saturate = Default. Saturation rounds an n-bit signed value in the range [- ( 2^(n-1) ) : +2^(n-1) - 1 ].
  • 3: symmetric = Controls symmetric saturation. Symmetric saturation rounds an n-bit signed value in the range [- ( 2^(n-1) -1 ) : +2^(n-1) - 1 ].
template <
    typename TT_DATA,
    typename TT_COEFF,
    unsigned int TP_FIR_LEN,
    unsigned int TP_SHIFT,
    unsigned int TP_RND,
    unsigned int TP_INPUT_WINDOW_VSIZE,
    unsigned int TP_CASC_LEN = 1,
    unsigned int TP_DUAL_IP = 0,
    unsigned int TP_USE_COEFF_RELOAD = 0,
    unsigned int TP_NUM_OUTPUTS = 1,
    unsigned int TP_UPSHIFT_CT = 0,
    unsigned int TP_API = 0,
    unsigned int TP_SSR = 1,
    unsigned int TP_PARA_INTERP_POLY = 1,
    unsigned int TP_SAT = 1
    >
class fir_interpolate_hb_graph: public graph

// structs

template <int dim>
struct aieml_ssr_params

template <unsigned int dim>
struct ct_fir_params

template <int dim>
struct sr_asym_graph_params

template <int dim>
struct ssr_params

// fields

kernel m_firKernels[TP_CASC_LEN *TP_SSR *TP_SSR]
kernel m_ct_firKernels[TP_SSR]
port_array <input, TP_SSR> in
port_array <output, TP_SSR> out
port_conditional_array <input, (TP_DUAL_IP==1), TP_SSR> in2
port_conditional_array <input, (TP_USE_COEFF_RELOAD==1), TP_SSR> coeff
port_conditional_array <input, (TP_USE_COEFF_RELOAD==1), TP_SSR> coeffCT
port_conditional_array <output, (TP_NUM_OUTPUTS==2), TP_SSR> out2
 port_conditional_array< output,(TP_PARA_INTERP_POLY >
TP_SSR out3
port_conditional_array <output, (TP_PARA_INTERP_POLY>&& TP_NUM_OUTPUTS

Fields

kernel m_firKernels [TP_CASC_LEN *TP_SSR *TP_SSR]

The array of kernels that will be created and mapped onto AIE tiles. Number of kernels ( TP_CASC_LEN * TP_SSR ) will be connected with each other by cascade interface.

kernel m_ct_firKernels [TP_SSR]

The array of kernels that will be created and mapped onto AIE tiles, to process Center tap on a parallel polyphase (TP_PARA_INTERP_POLY == 2). Number of kernels ( TP_SSR ) will be connected with each other by cascade interface.

port_array <input, TP_SSR> in

The input data array to the function. This input array is either a window API of samples of TT_DATA type or stream API (depending on TP_API). Note: Margin is added internally to the graph, when connecting input port with kernel port. Therefore, margin should not be added when connecting graph to a higher level design unit. Margin size (in Bytes) equals to TP_FIR_LEN rounded up to a nearest multiple of 32 bytes.

port_array <output, TP_SSR> out

The output data array from the function. This output is either a window API of samples of TT_DATA type or stream API (depending on TP_API). Number of output samples is determined by interpolation & decimation factors (if present).

port_conditional_array <input, (TP_DUAL_IP==1), TP_SSR> in2

The conditional input array data to the function. This input is (generated when TP_DUAL_IP == 1) either a window API of samples of TT_DATA type or stream API (depending on TP_API).

port_conditional_array <input, (TP_USE_COEFF_RELOAD==1), TP_SSR> coeff

The conditional array of input async ports used to pass run-time programmable (RTP) coefficients. This port_conditional_array is (generated when TP_USE_COEFF_RELOAD == 1) an array of input ports, which size is *defined by TP_SSR. Each port in the array holds a duplicate of the coefficient array, required to connect to each SSR input path. Size of the coefficient array is dependent on the TP_SSR.

  • When TP_SSR = 1, the taps array must be supplied in a compressed form for this halfband application, i.e.

    taps[] = {c0, c2, c4, …, cN, cCT} where

    N = (TP_FIR_LEN+1)/4 and cCT is the center tap.

    For example, a 7-tap halfband decimator might use coeffs (1, 0, 2, 5, 2, 0, 1).

    This would be input as coeff[]= {1,2,5} since the context of halfband decimation allows the remaining coefficients to be inferred.

  • When TP_SSR > 1, the taps array must be partially uncompressed and symmetry must be removed. For example, a 7-tap halfband decimator might use coeffs (1, 0, 2, 5, 2, 0, 1).

    This would be input as coeff[]= {1,2,5,2,1}.

port_conditional_array <input, (TP_USE_COEFF_RELOAD==1), TP_SSR> coeffCT

The conditional array of input async ports used to pass run-time programmable (RTP) coefficients. This port is (generated when TP_USE_COEFF_RELOAD == 1 and only for TP_PARA_INTERP_POLY > 1) and connects Center *Tap coefficient to dedicated Center tap kernels. Each port in the array holds a duplicate of the Center Tap coefficient (single coefficient extracted out of the *coeff array), required to connect to each SSR input path. * CT coefficient’s position is defined by: (TP_FIR_LEN+1)/4 .

port_conditional_array <output, (TP_NUM_OUTPUTS==2), TP_SSR> out2

The output data array from the function. This output is (generated when TP_NUM_OUTPUTS == 2) either a window API of samples of TT_DATA type or stream API (depending on TP_API). Number of output samples is determined by interpolation & decimation factors (if present).

 port_conditional_array< output,(TP_PARA_INTERP_POLY >

The output data from the function. This output is (generated when TP_SSR > 1) a stream API of TT_DATA type

port_conditional_array <output, (TP_PARA_INTERP_POLY>&& TP_NUM_OUTPUTS

The output data from the function. This output is (generated when TP_SSR > 1 and TP_NUM_OUTPUTS = 2) a stream API of TT_DATA type

Methods

getKernels

kernel* getKernels ()

Access function to get pointer to kernel (or first kernel in a chained configuration).

getKernelArchs

unsigned int getKernelArchs ()

Access function to get kernel’s architecture (or first kernel’s architecture in a chained configuration).

fir_interpolate_hb_graph

fir_interpolate_hb_graph overload (1)

fir_interpolate_hb_graph (const std::vector <TT_COEFF>& taps)

This is the constructor function for the FIR graph with static coefficients.

Parameters:

taps

a reference to the std::vector array of taps values of type TT_COEFF.

The taps array must be supplied in a compressed form for this halfband application, i.e.

taps[] = {c0, c2, c4, …, cN, cCT} where

N = (TP_FIR_LEN+1)/4 and cCT is the center tap.

For example, a 7-tap halfband interpolator might use coeffs (1, 0, 2, 5, 2, 0, 1).

This would be input as taps[]= {1,2,5} since the context of halfband interpolation allows the remaining coefficients to be inferred.

fir_interpolate_hb_graph overload (2)

fir_interpolate_hb_graph ()

This is the constructor function for the FIR graph with reloadable coefficients.

getMinCascLen

template <
    int T_FIR_LEN,
    int T_API,
    typename T_D,
    typename T_C,
    unsigned int SSR
    >
static constexpr unsigned int getMinCascLen ()

Access function to get Graphs minimum cascade length for a given configuration.

Parameters:

T_FIR_LEN tap length of the fir filter
T_API interface type : 0 - window, 1 - stream
T_D data type
T_C coeff type
SSR parallelism factor set for super sample rate operation

getOptCascLen

template <
    int T_FIR_LEN,
    typename T_D,
    typename T_C,
    int T_API,
    int T_PORTS,
    unsigned int SSR
    >
static constexpr unsigned int getOptCascLen ()

Access function to get graph’s cascade length to obtain maximum performance for streaming configurations (used for this element only when SSR > 1).

Parameters:

T_FIR_LEN tap length of the fir filter
T_D data type
T_C coeff type
T_API interface type : 0 - window, 1 - stream
T_PORTS single/dual input and output ports. 1 : single, 2 : dual