A finite impulse response (FIR) filter is described by the following equation, where x denotes the input, C denotes the coefficients, y denotes the output, and N denotes the length of the filter.
Following is an example of a 32-tap filter.
Each output takes 32 multiplications. If you take cint16
as the data type and coefficient type, it takes 4 cycles to
compute a sample in a kernel, because each AI Engine can perform 8 MAC operations a cycle. If data is
streaming from one stream port (32 bits), one data can produce one output (in the
middle of processing).
So, the design is compute bound. You will see how to split the kernel into 4 cascaded kernels to process one sample per cycle.