Super Sample Rate - Port Utilization and Throughput - 2024.1 English

Vitis Libraries

Release Date
2024-08-06
Version
2024.1 English

The number of input/output ports created by a FIR will be given by the formula:

  • Number of input ports: NUM_INPUT_PORTS  = TP_PARA_DECI_POLY x TP_SSR x (TP_DUAL_IP + 1)
  • Number of output ports: NUM_OUTPUT_PORTS  = TP_PARA_INTERP_POLY x TP_SSR x TP_NUM_OUTPUTS

Therefore, the maximum throughput achievable for a given data type, e.g., cint16 and 1 GHz AIE Clock, can be estimated with:

  • maximum theoretical sample rate at input: THROUGHPUT_IN  = NUM_INPUT_PORTS x 1 GSa/s,
  • maximum theoretical sample rate at output: THROUGHPUT_OUT  = NUM_OUTPUT_PORTS x 1 GSa/s.

AIE Tile Utilization Ratio

A Super Sample Rate operation creates multiple computation paths that are used to produce the output samples. Having multiple computation paths reduces the amount of computation required by each kernel.

The total number of FIR computation paths can be described with the following formula:

NUMBER_OF_COMPUTATION_PATHS = TP_CASC_LEN  * TP_SSR * TP_PARA_INTERP_POLY * TP_PARA_DECI_POLY

The FIR graph will try to split the requested FIR workload among the FIR kernels equally, which can mean that each kernel is tasked with a comparatively low computational effort.

In such a scenario, the bandwidth will be limited by the amount of ports, but the AIE tile utilization ratio (often defined as ratio of VMAC operations to cycles without VMAC operation) might be reduced.

For example, a 32 tap Single Rate FIR operating on a cint16 data type and int16 coefficients with TP_SSR set to 2 and a cascade length TP_CASC_LEN set to 2 will perform at the bandwidth close to 2 GSa/s (2 output stream paths). Each of the kernels will be tasked with computing only eight coefficients. The design will use eight FIR kernels mapped to eight AIE tiles to achieve that. However, a similarly configured FIR, a 32 tap Single Rate FIR operating on cint16 data type and int16 coefficients with TP_SSR set to 2, but without further cascade configuration (TP_CASC_LEN set to 1) would also perform at the bandwidth close to 2 GSa/s but only consume four kernels to achieve that.

Rate-changing FIR Throughput

For rate changers, the bandwidth of either the input or output port, depending on whether it is a decimator or an interpolator, can limit the throughput of the filter.
For example, an interpolator with an interpolation factor of 3 produces three times the number of outputs as inputs. However, the AIE stream port bandwidth is the same for the input and output.
Hence, if the output runs at maximum bandwidth, the input would need to run at 1/3rd its maximum bandwidth, and you are forced to underutilize the input stream of the filter at only 33 percent efficiency.
However, if you are able to split the operation of the interpolator over three kernels, broadcast the input stream to their inputs, and operate the kernels at maximum performance, it will be possible to use both the input and output bandwidths at their maximum bandwidths.