Super Sample Rate - Port Utilization & Throughput - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

The number of input/output ports created by a FIR will be given by the formula:

  • Number of input ports: NUM_INPUT_PORTS  = TP_PARA_DECI_POLY x TP_SSR x (TP_DUAL_IP + 1)
  • Number of output ports: NUM_OUTPUT_PORTS  = TP_PARA_INTERP_POLY x TP_SSR x TP_NUM_OUTPUTS

Therefore, the maximum throughput achievable for a given data type, e.g. cint16, can be estimated with:

  • maximum theoretical sample rate at input: THROUGHPUT_IN  = NUM_INPUT_PORTS x 1 GSa/s,
  • maximum theoretical sample rate at output = THROUGHPUT_OUT  = NUM_OUTPUT_PORTS x 1 GSa/s.

AIE Tile Utilization Ratio

Super Sample Rate operation creates multiple computation paths that are used to produce the output samples. Having multiple computation paths reduces the amount of computation required by each kernel.

Total number of FIR computation paths can be described with the below formula

NUMBER_OF_COMPUTATION_PATHS = TP_CASC_LEN  * TP_SSR * TP_PARA_INTERP_POLY * TP_PARA_DECI_POLY

FIR graph will try to split the requested FIR workload among the FIR kernels equally, which may mean that each kernel is tasked with a comparatively low computational effort.

In such scenario, bandwidth will be limited by the amount of ports, but AIE tile utilization ratio (often defined as ratio of VMAC operations to cycles without VMAC operation) may be reduced.

For example, a 32 tap Single Rate FIR operating on cint16 data type and int16 coefficients with TP_SSR set to 2 and cascade length TP_CASC_LEN set to 2 will perform at the bandwidth close to 2 GSa/s (2 output stream paths). Each of the kernels will be tasked with computing only 8 coefficients. The design will use 8 FIR kernels mapped to 8 AIE tiles to achieve that. However, a similarly configured FIR, a 32 tap Single Rate FIR operating on cint16 data type and int16 coefficients with TP_SSR set to 2, but without further cascade configuration (TP_CASC_LEN set to 1) would also perform at the bandwidth close to 2 GSa/s but only consume 4 kernels to achieve that.

Rate-changing FIR Throughput

For rate changers, the bandwidth of either the input or output port, depending on whether it is a decimator or an interpolator, may limit the throughput of the filter.
For example, an interpolator with interpolation factor of 3 produces three times the number of outputs as inputs. However, the AIE stream port bandwidth is the same for input and output.
Hence, if the output runs at maximum bandwidth, the input would need to run at 1/3rd its maximum bandwidth, and we are forced to underutilize the input stream of the filter at only 33 percent efficiency.
However, if we are able to split the operation of the interpolator over 3 kernels, broadcast the input stream to their inputs, and operate the kernels at maximum performance, it will be possible to use both the input and output bandwidths at their maximum bandwidths.