The following equation describes the finite impulse response (FIR) filter. x denotes the input, C denotes the coefficients, y denotes the output, and N denotes the length of the filter.
Following is an example of a 32-tap filter.
Each output takes 32 multiplications. If you use
cint16 for data and coefficient types, the kernel needs four
cycles to compute a sample. Each AI Engine
performs eight MAC operations per cycle. If data is streaming from one stream port
(32 bits), one data can produce one output (in the middle of processing).
So, the design is compute bound. You can split the kernel into four cascaded kernels to process one sample per cycle.