Final Farrow Optimization - Final Farrow Optimization - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

The final version of the implementation splits the four for loops into two kernels as previously discussed. The final optimization performed in this version of the implementation is with regards to the storage of intermediate result z2 and z1 shown in Figure 2.

The loops in farrow_kernel2.cpp run sequentially. Memory banks support simultaneous read and write per clock cycle. Store both results in the same memory bank. Use different pointer addresses to store z2 and z1 within the shared bank.

Implement the changes the farrow_final/aie design files. Repeat the earlier characterization steps to evaluate performance. Run make all and confirm the console displays the expected output:

*** LOOP_II *** Tile: 24_1	minII: 3	beforeII: 16	afterII: 3	Line: 50	File: farrow_kernel2.cpp
*** LOOP_II *** Tile: 24_1	minII: 3	beforeII: 16	afterII: 3	Line: 66	File: farrow_kernel2.cpp
*** LOOP_II *** Tile: 24_1	minII: 3	beforeII: 16	afterII: 3	Line: 82	File: farrow_kernel2.cpp
*** LOOP_II *** Tile: 25_0	minII: 16	beforeII: 29	afterII: 16	Line: 53	File: farrow_kernel1.cpp
Raw Throughput = 1150.0 MSPS
Max error LSB = 1

Launch vitis_analyzer, vitis_analyzer Work/farrow_app.aiecompile_summary. The current implementation generates the summary view. The final design uses two compute tiles and a total of five tiles when taking buffers into consideration.

figure10

Figure 10 - Farrow Filter Final Implementation Summary View

Launch vitis_analyzer with vitis_analyzer aiesimulator_output/default.aierun_summary. The current implementation generates the views as shown in the following figure. Observe the new ping-pong buffers associated with the intermediate outputs connected between the two kernels.

figure11

Figure 11 - Farrow Filter Final Implementation Graph View

figure12

Figure 12 - Farrow Filter Final Implementation Array View

figure13

Figure 13 - Farrow Filter Final Implementation Trace View

Steady state throughput is 1024/913e-6 = 1122 Msps.