#include "fir_decimate_asym_graph.hpp"
Overview
fir_decimate_asym is an Asymmetric Decimation FIR filter
These are the templates to configure the asymmetric decimator FIR class.
Parameters:
TT_DATA | describes the type of individual data samples input to and output from the filter function. This is a typename and must be one of the following: int16, cint16, int32, cint32, float, cfloat. |
TT_COEFF | describes the type of individual coefficients of the filter taps. It must be one of the same set of types listed for TT_DATA and must also satisfy the following rules:
|
TP_FIR_LEN | is an unsigned integer which describes the number of taps in the filter. TP_FIR_LEN must be in the range 4 to 240 and must be an integer multiple of the TP_DECIMATE_FACTOR value. |
TP_DECIMATE_FACTOR | is an unsigned integer which describes the decimation factor of the filter, the ratio of input to output samples. TP_DECIMATE_FACTOR must be in the range 2 up to 7, however max supported decimation rate depends on the data type. |
TP_SHIFT | describes power of 2 shift down applied to the accumulation of FIR terms before output. TP_SHIFT must be in the range 0 to 61. |
TP_RND | describes the selection of rounding to be applied during the shift down stage of processing. Although, TP_RND accepts unsigned integer values descriptive macros are recommended where
|
TP_INPUT_WINDOW_VSIZE | describes the number of samples processed by the graph in a single iteration run. When TP_API is set to 0, samples are buffered and stored in a ping-pong window buffer mapped onto Memory Group banks. As a results, maximum number of samples processed by the graph is limited by the size of Memory Group. When TP_API is set to 1, samples are processed directly from the stream inputs and no buffering takes place. In such case, maximum number of samples processed by the graph is limited to 32-bit value (4.294B samples per iteration). Note: For SSR configurations (TP_SSR>1), the input data must be split over multiple ports, where each successive sample is sent to a different input port in a round-robin fashion. As a results, each SSR input path will process a fraction of the frame defined by the TP_INPUT_WINDOW_VSIZE. The number of values in the output window will be TP_INPUT_WINDOW_VSIZE divided by TP_DECIMATE_FACTOR by virtue the decimation factor. TP_INPUT_WINDOW_VSIZE must be an integer multiple of TP_DECIMATE_FACTOR. The resulting output window size must be a multiple of 256bits. Note: Margin size should not be included in TP_INPUT_WINDOW_VSIZE. |
TP_CASC_LEN | describes the number of AIE processors to split the operation over. This allows resource to be traded for higher performance. TP_CASC_LEN must be in the range 1 (default) to 40. |
TP_USE_COEFF_RELOAD | allows the user to select if runtime coefficient reloading should be used. When defining the parameter:
|
TP_NUM_OUTPUTS | sets the number of ports over which the output is sent. This can be 1 or 2. It is set to 1 by default. Depending on TP_API, additional output ports functionality differs. For Windows API, additional output provides flexibility in connecting FIR output with multiple destinations. Additional output With Stream API, the additional output port increases the FIR’s throughput. Data is sent in a 128-bit interleaved pattern, e.g. :
|
TP_DUAL_IP | allows 2 stream inputs to be connected to FIR, increasing available throughput. When set to 0, single stream will be connected as FIRs input. When set to 1, two stream inputs will be connected. In such case data should be organized in 128-bit interleaved pattern, e.g.:
|
TP_API | specifies if the input/output interface should be window-based or stream-based. The values supported are 0 (window API) or 1 (stream API). |
TP_SSR | specifies the number of parallel input/output paths where samples are interleaved between paths, giving an overall higher throughput. An SSR of 1 means just one output leg, and is the backwards compatible option. |
TP_PARA_DECI_POLY | specifies the number of decimator polyphases that will be split up and executed in a series of pipelined cascade stages, resulting in additional input paths. A TP_PARA_DECI_POLY of 1 means just one input leg, and is the backwards compatible option. TP_PARA_DECI_POLY = TP_DECIMATE_FACTOR will result in an decimate factor of polyphases, operating as independent single rate filters connected by cascades. TP_PARA_DECI_POLY < TP_DECIMATE_FACTOR will result in the polyphase branches operating as independent decimators connected by cascades. The number of AIEs used is given by TP_PARA_DECI_POLY * TP_SSR^2 * TP_CASC_LEN. \n |
TP_SSR | specifies the number of parallel input/output paths where samples are interleaved between paths, giving an overall higher throughput. A TP_SSR of 1 means just one output leg and 1 input phase, and is the backwards compatible option. The number of AIEs used is given by |
TP_SAT | describes the selection of saturation to be applied during the shift down stage of processing. TP_SAT accepts unsigned integer values, where:
|
template < typename TT_DATA, typename TT_COEFF, unsigned int TP_FIR_LEN, unsigned int TP_DECIMATE_FACTOR, unsigned int TP_SHIFT, unsigned int TP_RND, unsigned int TP_INPUT_WINDOW_VSIZE, unsigned int TP_CASC_LEN = 1, unsigned int TP_USE_COEFF_RELOAD = 0, unsigned int TP_NUM_OUTPUTS = 1, unsigned int TP_DUAL_IP = 0, unsigned int TP_API = 0, unsigned int TP_SSR = 1, unsigned int TP_PARA_DECI_POLY = 1, unsigned int TP_SAT = 1 > class fir_decimate_asym_graph: public graph // structs template <int dim> struct ssr_params template <unsigned int CL> struct tmp_ssr_params // fields static constexpr unsigned int IN_SSR static constexpr unsigned int lastSSRDim static constexpr const char* srcFileName kernel m_firKernels[TP_CASC_LEN *TP_SSR *TP_SSR *TP_PARA_DECI_POLY] port_array <input, IN_SSR> in port_array <output, TP_SSR> out port_conditional_array <input, (TP_DUAL_IP==1), IN_SSR> in2 port_conditional_array <input, (TP_USE_COEFF_RELOAD==1), TP_SSR> coeff port_conditional_array <output, (TP_NUM_OUTPUTS==2), TP_SSR> out2
Fields
static constexpr unsigned int IN_SSR
Size of the Input port array in SSR operation mode
kernel m_firKernels [TP_CASC_LEN *TP_SSR *TP_SSR *TP_PARA_DECI_POLY]
The array of kernels that will be created and mapped onto AIE tiles. Number of kernels ( TP_CASC_LEN * TP_SSR
) will be connected with each other by cascade interface.
port_array <input, IN_SSR> in
The input data to the function. This input is either a window API of samples of TT_DATA type or stream API (depending on TP_API). Note: Margin is added internally to the graph, when connecting input port with kernel port. Therefore, margin should not be added when connecting graph to a higher level design unit. Margin size (in Bytes) equals to TP_FIR_LEN rounded up to a nearest multiple of 32 bytes.
port_array <output, TP_SSR> out
The output data from the function. This output is either a window API of samples of TT_DATA type or stream API (depending on TP_API). Number of output samples is determined by interpolation & decimation factors (if present).
port_conditional_array <input, (TP_DUAL_IP==1), IN_SSR> in2
The conditional input data to the function. This input is (generated when TP_DUAL_IP == 1) either a window API of samples of TT_DATA type or stream API (depending on TP_API).
port_conditional_array <input, (TP_USE_COEFF_RELOAD==1), TP_SSR> coeff
The conditional array of input async ports used to pass run-time programmable (RTP) coeficients. This port_conditional_array is (generated when TP_USE_COEFF_RELOAD == 1) an array of input ports, which size is defined by TP_SSR. Each port in the array holds a duplicate of the coefficient array, required to connect to each SSR input path.
port_conditional_array <output, (TP_NUM_OUTPUTS==2), TP_SSR> out2
The output data from the function. This output is (generated when TP_NUM_OUTPUTS == 2) either a window API of samples of TT_DATA type or stream API (depending on TP_API). Number of output samples is determined by interpolation & decimation factors (if present).
Methods
getKernels
getKernels overload (1)
kernel* getKernels ()
Access function to get pointer to kernel (or first kernel in a chained configuration).
getKernels overload (2)
kernel* getKernels (int cascadePosition)
Access function to get pointer to an indexed kernel.
Parameters:
cascadePosition | an index to the kernel’s position in the cascade. |
getInNet
connect <stream, stream>* getInNet ( int ssrOutPathIndex, int ssrInPhaseIndex, int cascadePosition )
Access function to get pointer to net of the in
port. Nets only get assigned when streaming interface is being broadcast, i.e. nets only get used when TP_API == 1 and TP_CASC_LEN > 1
Parameters:
ssrOutPathIndex | an index to the output data Path. |
ssrInPhaseIndex | an index to the input data Phase |
cascadePosition | an index to the kernel’s position in the cascade. |
getIn2Net
connect <stream, stream>* getIn2Net ( int ssrOutPathIndex, int ssrInPhaseIndex, int cascadePosition )
Access function to get pointer to net of the in2
port, when port is being generated, i.e. when TP_DUAL_IP == 1. Nets only get assigned when streaming interface is being broadcast, i.e. nets only get used when TP_API == 1 and TP_CASC_LEN > 1
Parameters:
ssrOutPathIndex | an index to the output data Path. |
ssrInPhaseIndex | an index to the input data Phase |
cascadePosition | an index to the kernel’s position in the cascade. |
getInNet
connect <stream, stream>* getInNet ( int cascadePosition, int ssrInPhaseIndex, int ssrOutPathIndex, int paraPolpyhaseIndex )
Access function to get pointer to net of the in
port. Nets only get assigned when streaming interface is being broadcast, i.e. nets only get used when TP_API == 1 and TP_CASC_LEN > 1
Parameters:
cascadePosition | an index to the kernel’s position in the cascade. |
ssrInPhaseIndex | an index to the input data Phase |
ssrOutPathIndex | an index to the output data Path. |
paraPolpyhaseIndex | an index to the kernel’s parallel polyphase. |
getIn2Net
connect <stream, stream>* getIn2Net ( int cascadePosition, int ssrInPhaseIndex, int ssrOutPathIndex, int paraPolpyhaseIndex )
Access function to get pointer to net of the in2
port, when port is being generated, i.e. when TP_DUAL_IP == 1. Nets only get assigned when streaming interface is being broadcast, i.e. nets only get used when TP_API == 1 and TP_CASC_LEN > 1
Parameters:
cascadePosition | an index to the kernel’s position in the cascade. |
ssrInPhaseIndex | an index to the input data Phase |
ssrOutPathIndex | an index to the output data Path. |
paraPolpyhaseIndex | an index to the kernel’s parallel polyphase. |
getKernelArchs
unsigned int getKernelArchs ()
Access function to get kernel’s architecture (or first kernel’s architecture in a chained configuration).
fir_decimate_asym_graph
fir_decimate_asym_graph overload (1)
fir_decimate_asym_graph (const std::vector <TT_COEFF>& taps)
This is the constructor function for the FIR graph with static coefficients.
Parameters:
taps | a reference to the std::vector array of taps values of type TT_COEFF. |
fir_decimate_asym_graph overload (2)
fir_decimate_asym_graph ()
This is the constructor function for the FIR graph with reloadable coefficients.
getMinCascLen
template < int T_FIR_LEN, int T_API, typename T_D, typename T_C, unsigned int DF, unsigned int SSR > static constexpr unsigned int getMinCascLen ()
Access function to get Graphs minimum cascade length for a given configuration.
Parameters:
T_FIR_LEN | tap length of the fir filter |
T_API | interface type : 0 - window, 1 - stream |
T_D | data type |
T_C | coeff type |
DF | decimation factor |
SSR | parallelism factor set for super sample rate operation |
getOptCascLen
template < int T_FIR_LEN, typename T_D, typename T_C, int T_API, unsigned int DF, unsigned int SSR > static constexpr unsigned int getOptCascLen ()
Access function to get graph’s cascade length to obtain maximum performance for streaming configurations.
Parameters:
T_FIR_LEN | tap length of the fir filter |
T_D | data type |
T_C | coeff type |
T_API | interface type : 0 - window, 1 - stream |
DF | decimation factor |
SSR | parallelism factor set for super sample rate operation |