This section is intended to provide guidance on how best to configure the FIRs in some typical scenarios, or when designing with one particular metric in mind, such as resource use or performance.
Configuring for Requirements Based on Performance Versus Resource Use
The least resource-expensive method to obtain higher performance is to use the dual ports features, i.e., TP_DUAL_IP = 1 and/or TP_NUM_OUTPUTS = 2.
TP_PARA_{INTERP/DECI}_POLY parameter.TP_PARA_X_POLY can take a minimum value of 1 and a maximum value equal to the interpolation factor or the decimation factor. It can increase in steps of the integer factors of the interpolation or decimation factor.TP_PARA_X_POLY parameter, the graph creates a number of TP_PARA_X_POLY polyphase paths. Each path contains TP_CASC_LEN kernels. The number of tiles used will be TP_PARA_X_POLY * TP_CASC_LEN, i.e., TP_PARA_X_POLY is a single dimensional expansion.TP_SSR is the parameter that enables finer control over the throughput and AI Engine tiles use.TP_CASC_LEN * TP_SSR * TP_SSR, i.e., SSR is a 2-dimensional expansion. Both methods can work in addition to the TP_CASC_LEN parameter which also increases the number of tiles. TP_SSR can take any positive integer value and its maximum is only limited by the number of AI Engine tiles available. This can be used to prevent overutilization of kernels if the throughput requirement is not as high as the one offered by TP_PARA_X_POLY.TP_CASC_LEN indicates the number of kernels to be cascaded together to distribute the calculation of the TP_FIR_LEN parameter. It works in addition to TP_SSR and TP_PARA_X_POLY to overcome any bottlenecks posed by the vector processor. The library provides access functions to determine the value of TP_CASC_LEN that gives you the optimum performance, i.e., the minimum number of kernels that can provide the maximum performance. More details can be found in API Reference Overview.
If there is no constraint on the number of AI Engine tiles, the easiest way to get the required performance is to set the TP_PARA_X_POLY to the closest factor of the interpolation/decimation rate that is higher than the throughput needed. If, however, the goal is to obtain a performance using the least number of tiles, TP_SSR might need to be used as a finer tuning parameter to get the throughput you want.
SCENARIO 1:
TP_PARA_INTERP_POLY can only be set to 5; this would need at least five AI Engine tiles. The optimum cascade length is 2. This would use 10 AI Engine tiles and give you 10 GSa/s at the output.TP_SSR = 2 and TP_PARA_INTERP_POLY = 1 will be able to do that in four AI Engine tiles, and the maximum throughput at the output would be 4 GSa/s.SCENARIO 2:
TP_PARA_INTERP_POLY can be set to 2. This would create two output paths and so, at least two AI Engine tiles. Say that the optimum cascade length for the data_type/coeff_type combination is 2. Set TP_CASC_LEN = 2.SCENARIO 3:
TP_PARA_INTERP_POLY can be set to 2 (which is the maximum value). This would create a maximum of two output paths which can only have a maximum throughput of 4 GSa/s.TP_PARA_INTERP_POLY cannot be increased further, use the TP_SSR parameter to increase the throughput available. Setting TP_SSR = 2 will double the total available throughput by doubling the input and output paths.