As there is no post-add lane reduction hardware in the floating-point pipeline of the AI Engine, all outputs will always be on eight lanes (float) or four lanes (cfloat). This means that we can compute eight (four) lanes in parallel, each time with a single coefficient, using fpmul and then fpmac for all the coefficients, one by one.
The floating-point accumulator has a latency of two clock cycles, so two fpmac instructions using the same accumulator cannot be used back to back, but only every other cycle. Code can be optimized by using two accumulators, used in turn, that gets added at the end to get the final result.
Navigate to the
FIRFilterdirectory.Type
make allaiein the console and wait for completion of the three following stages:aieaiesimaiecmpaieviz
The last stage is opening vitis_analyzer that allows you to visualize the graph of the design and the simulation process timeline.
In this design, you learned:
How to use real floating-point data and coefficients in FIR filters.
How to handle complex floating-point data and complex floating-points coefficients in FIR filters.
How to organize the compute sequence.
How to use:
fpmul,fpmac, andfpaddin the real and complex case.