This section is intended to provide guidance for the user on how best to configure the FIRs in some typical scenarios, or when designing with one particular metric in mind, such as resource use or performance.
Configuring for requirements based on performance vs resource use
The least resource-expensive method to obtain higher performance is to use the dual ports features, i.e. TP_DUAL_IP
= 1 and/or TP_NUM_OUTPUTS
= 2.
TP_PARA_{INTERP/DECI}_POLY
parameter.TP_PARA_X_POLY
can take a minimum value of 1 and a maximum value equal to the interpolation factor or the decimation factor. It can increase in steps of the integer factors of the interpolation or decimation factor.TP_PARA_X_POLY
parameter, the graph creates a number of TP_PARA_X_POLY
polyphase paths. Each path contains TP_CASC_LEN
kernels. The number of tiles used will be TP_PARA_X_POLY * TP_CASC_LEN
, i.e. TP_PARA_X_POLY
is a single dimensional expansion.TP_SSR
is the parameter that enables finer control over the throughput and AIE tiles use.TP_CASC_LEN * TP_SSR * TP_SSR
, i.e. SSR is a 2-dimensional expansion. Both methods may work in addition to the TP_CASC_LEN
parameter which also increases the number of tiles. TP_SSR
can take any positive integer value and its maximum is only limited by the number of AIE tiles available. This can be used to prevent over-utilization of kernels if the throughput requirement is not as high as the one offered by the TP_PARA_X_POLY
.TP_CASC_LEN
indicates the number of kernels to be cascaded together to distribute the calculation of the TP_FIR_LEN
parameter. It works in addition to TP_SSR
and TP_PARA_X_POLY
to overcome any bottlenecks posed by the vector processor. The library provides access functions to determine the value of TP_CASC_LEN
that gives us the optimum performance, i.e., the minimum number of kernels that can provide the maximum performance. These are documented here (insert link here to API reference docs here).
If there is no constraint on the number of AIE tiles, the easiest way to get the required performance is to set the TP_PARA_X_POLY
to the closest factor of the interpolation/decimation rate that is higher than the throughput needed. If, however, the goal is to obtain a performance using the least number of tiles, TP_SSR
may need to be used as a finer tuning parameter to get the throughput we want.
SCENARIO 1:
TP_PARA_INTERP_POLY
can only be set to 5, this would need at least 5 AIE tiles. The optimum cascade length is 2. This would use 10 AIE tiles and give us 10GSa/s at the output.TP_SSR = 2
and TP_PARA_INTERP_POLY = 1
will be able to do that in 4 AIE tiles and the maximum throughput at the output would be 4GSa/s.SCENARIO 2:
TP_PARA_INTERP_POLY
can be set to 2. This would create 2 output paths and so, at least 2 AIE tiles. Let’s say that the optimum cascade length for the data_type/coeff_type combination is 2, Set TP_CASC_LEN = 2
.SCENARIO 3:
TP_PARA_INTERP_POLY
can be set to 2 (which is the maximum value). This would create a maximum of 2 output paths which can only have a maximum throughput of 4GSa/s.TP_PARA_INTERP_POLY
cannot be increased further, we use the TP_SSR
parameter to increase the throughput available. Setting TP_SSR = 2
will double the total available throughput by doubling the input and output paths.