Inspecting farrow_initial/aie/farrow_kernel.cpp
and farrow_initial/aie/farrow_kernel.h
, you can quickly observe a few possible optimizations.
There are four vector registers with the same content (v_buff3/2/1/0). Those can be replaced with one.
There are four separate vector registers to store filter coefficients (f3-f0_coeffs). These can be combined into one, while using different indices in
aie::sliding_mul_sym_ops()
to select the proper coefficients.There are four state variables to store the same content in tile memory (f3-f0_state). Those can be replaced with one.
The required 16 bits of the 32-bit $u(nT_s)$ signal arrive in interleaved fashion in a vector register. To extract the needed samples, the
aie::filter_even()
API is used which consumes additional cycles. This can be simplified by placing the 16-bit samples of interest consecutively followed by zero stuffing the remaining bits. This requires a different input simulation file for the rearranged $u(nT_s)$ signal, hencegen_vectors.m
producing an additionaldel_i
text file.
Once those changes are implemented in the farrow_optimization1/aie
folder, you can repeat the previously mentioned steps to characterize the design.
After running make all
, the console should display:
*** LOOP_II *** Tile: 24_0 minII: 28 beforeII: 91 afterII: 82 Line: 62 File: farrow_kernel.cpp
Raw Throughput = 300.8 MSPS
Max error LSB = 1
Achieved II dropped from 123 to 82, but you are still not where you need to be, so further optimization is needed.