Configuration Notes - 2023.1 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.1 English

This section is intended to provide guidance for the user on how best to configure the FIRs in some typical scenarios, or when designing with one particular metric in mind, such as resource use or performance.

Configuring for requirements based on performance vs resource use

The least resource-expensive method to obtain higher performance is to use the dual ports features, i.e. TP_DUAL_IP = 1 and/or TP_NUM_OUTPUTS = 2.

The next method that offers higher performances at lower resource costs is the TP_PARA_{INTERP/DECI}_POLY parameter.
TP_PARA_X_POLY can take a minimum value of 1 and a maximum value equal to the interpolation factor or the decimation factor. It can increase in steps of the integer factors of the interpolation or decimation factor.
It is important to note that, the advantage of higher throughput comes at the cost of additional AIE tiles. When we set the TP_PARA_X_POLY parameter, the graph creates a number of TP_PARA_X_POLY polyphase paths. Each path contains TP_CASC_LEN kernels. The number of tiles used will be TP_PARA_X_POLY * TP_CASC_LEN, i.e. TP_PARA_X_POLY is a single dimensional expansion.
TP_SSR is the parameter that enables finer control over the throughput and AIE tiles use.
The number of tiles used will be TP_CASC_LEN * TP_SSR * TP_SSR, i.e. SSR is a 2-dimensional expansion. Both methods may work in addition to the TP_CASC_LEN parameter which also increases the number of tiles. TP_SSR can take any positive integer value and its maximum is only limited by the number of AIE tiles available. This can be used to prevent over-utilization of kernels if the throughput requirement is not as high as the one offered by the TP_PARA_X_POLY.

TP_CASC_LEN indicates the number of kernels to be cascaded together to distribute the calculation of the TP_FIR_LEN parameter. It works in addition to TP_SSR and TP_PARA_X_POLY to overcome any bottlenecks posed by the vector processor. The library provides access functions to determine the value of TP_CASC_LEN that gives us the optimum performance, i.e., the minimum number of kernels that can provide the maximum performance. These are documented here (insert link here to API reference docs here).

If there is no constraint on the number of AIE tiles, the easiest way to get the required performance is to set the TP_PARA_X_POLY to the closest factor of the interpolation/decimation rate that is higher than the throughput needed. If, however, the goal is to obtain a performance using the least number of tiles, TP_SSR may need to be used as a finer tuning parameter to get the throughput we want.

SCENARIO 1:

For a 64 tap interpolate by 5 filter that needs 4GSa/s at output:
TP_PARA_INTERP_POLY can only be set to 5, this would need at least 5 AIE tiles. The optimum cascade length is 2. This would use 10 AIE tiles and give us 10GSa/s at the output.
On the other hand, setting TP_SSR = 2 and TP_PARA_INTERP_POLY = 1 will be able to do that in 4 AIE tiles and the maximum throughput at the output would be 4GSa/s.

SCENARIO 2:

For a 32 tap interpolate by 2 filter that needs 4GSa/s at output:
TP_PARA_INTERP_POLY can be set to 2. This would create 2 output paths and so, at least 2 AIE tiles. Let’s say that the optimum cascade length for the data_type/coeff_type combination is 2, Set TP_CASC_LEN = 2.
Note that the optimum cascade lengths for the various parameters can be obtained using the helper functions in API Reference Overview. With these 2 output paths, it is possible to obtain the required sample rate of 4GSa/s.

SCENARIO 3:

For a 32 tap interpolate by 2 filter that needs 8GSa/s at output:
TP_PARA_INTERP_POLY can be set to 2 (which is the maximum value). This would create a maximum of 2 output paths which can only have a maximum throughput of 4GSa/s.
Since TP_PARA_INTERP_POLY cannot be increased further, we use the TP_SSR parameter to increase the throughput available. Setting TP_SSR = 2 will double the total available throughput by doubling the input and output paths.
Note that the optimum cascade length in this case would be different.