The number of input/output ports created by a FIR will be given by the formula:
- Number of input ports:
NUM_INPUT_PORTS = TP_PARA_DECI_POLY x TP_SSR x (TP_DUAL_IP + 1)
- Number of output ports:
NUM_OUTPUT_PORTS = TP_PARA_INTERP_POLY x TP_SSR x TP_NUM_OUTPUTS
Therefore, the maximum throughput achievable for a given data type, e.g. cint16, can be estimated with:
- maximum theoretical sample rate at input:
THROUGHPUT_IN = NUM_INPUT_PORTS x 1 GSa/s
, - maximum theoretical sample rate at output =
THROUGHPUT_OUT = NUM_OUTPUT_PORTS x 1 GSa/s
.
AIE Tile Utilization Ratio
Super Sample Rate operation creates multiple computation paths that are used to produce the output samples. Having multiple computation paths reduces the amount of computation required by each kernel.
Total number of FIR computation paths can be described with the below formula
NUMBER_OF_COMPUTATION_PATHS = TP_CASC_LEN * TP_SSR * TP_PARA_INTERP_POLY * TP_PARA_DECI_POLY
FIR graph will try to split the requested FIR workload among the FIR kernels equally, which may mean that each kernel is tasked with a comparatively low computational effort.
In such scenario, bandwidth will be limited by the amount of ports, but AIE tile utilization ratio (often defined as ratio of VMAC operations to cycles without VMAC operation) may be reduced.
For example, a 32 tap Single Rate FIR operating on cint16
data type and int16
coefficients with TP_SSR
set to 2 and cascade length TP_CASC_LEN
set to 2 will perform at the bandwidth close to 2 GSa/s (2 output stream paths). Each of the kernels will be tasked with computing only 8 coefficients. The design will use 8 FIR kernels mapped to 8 AIE tiles to achieve that.
However, a similarly configured FIR, a 32 tap Single Rate FIR operating on cint16
data type and int16
coefficients with TP_SSR
set to 2, but without further cascade configuration (TP_CASC_LEN
set to 1) would also perform at the bandwidth close to 2 GSa/s but only consume 4 kernels to achieve that.
Rate-changing FIR Throughput