Final Farrow Optimization - 2024.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English

The final version of the implementation splits the four for loops into two kernels as previously discussed. The final optimization performed in this version of the implementation is with regards to the storage of intermediate result z2 and z1 shown in Figure 2.

Because the loops in farrow_kernel2.cpp are accessed sequentially, and the memory banks support reading and writing in the same clock cycle, the same memory bank can be used to store both intermediate results z2 and z1 as long as a different pointer address is used.

Once those changes are implemented into design files in the farrow_final/aie folder, repeat the previously mentioned steps to characterize the design. After running make all, the console should display:

*** LOOP_II *** Tile: 24_1	minII: 3	beforeII: 16	afterII: 3	Line: 50	File: farrow_kernel2.cpp
*** LOOP_II *** Tile: 24_1	minII: 3	beforeII: 16	afterII: 3	Line: 66	File: farrow_kernel2.cpp
*** LOOP_II *** Tile: 24_1	minII: 3	beforeII: 16	afterII: 3	Line: 82	File: farrow_kernel2.cpp
*** LOOP_II *** Tile: 25_0	minII: 16	beforeII: 29	afterII: 16	Line: 53	File: farrow_kernel1.cpp
Raw Throughput = 1151.1 MSPS
Max error LSB = 1

Launch vitis_analyzer, vitis_analyzer Work/farrow_app.aiecompile_summary. The current implementation generates the summary view shown below. The final design uses two compute tiles and a total of five tiles when taking buffers into consideration.

figure10

Figure 10 - Farrow Filter Final Implementation Summary View

Launch vitis_analyzer vitis_analyzer aiesimulator_output/default.aierun_summary. The current implementation generates the views shown below. Notice the new ping-pong buffers associated with the intermediate outputs connected between the two kernels.

figure11

Figure 11 - Farrow Filter Final Implementation Graph View

figure12

Figure 12 - Farrow Filter Final Implementation Array View

figure13

Figure 13 - Farrow Filter Final Implementation Trace View

Steady state throughput is 1024/912.8e-6 = 1115 Msps.