First Farrow Optimization - 2024.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English

Inspecting farrow_initial/aie/farrow_kernel.cpp and farrow_initial/aie/farrow_kernel.h, you can quickly observe a few possible optimizations.

  1. There are four vector registers with the same content (v_buff3/2/1/0). Those can be replaced with one.

  2. There are four separate vector registers to store filter coefficients (f3-f0_coeffs). These can be combined into one, while using different indices in aie::sliding_mul_sym_ops() to select the proper coefficients.

  3. There are four state variables to store the same content in tile memory (f3-f0_state). Those can be replaced with one.

  4. The required 16 bits of the 32-bit $u(nT_s)$ signal arrive in interleaved fashion in a vector register. To extract the needed samples, the aie::filter_even() API is used which consumes additional cycles. This can be simplified by placing the 16-bit samples of interest consecutively followed by zero stuffing the remaining bits. This requires a different input simulation file for the rearranged $u(nT_s)$ signal, hence gen_vectors.m producing an additional del_i text file.

Once those changes are implemented in the farrow_optimization1/aie folder, you can repeat the previously mentioned steps to characterize the design.

After running make all, the console should display:

*** LOOP_II *** Tile: 24_0	minII: 28	beforeII: 91	afterII: 82	Line: 62	File: farrow_kernel.cpp
Raw Throughput = 300.8 MSPS
Max error LSB = 1

Achieved II dropped from 123 to 82, but you are still not where you need to be, so further optimization is needed.