Migrating Fractional Delay Farrow Filter from AIE to AIE-ML Architecture
Version: Vitis 2024.1
Introduction
A fractional delay filter is a common digital signal processing (DSP) algorithm found in many applications including digital receivers in modems and is required for timing synchronization.
The Fractional Delay Farrow Filter design has already been implemented for the AIE architecture.
Before starting this tutorial on migrating the design from AIE to AIE-ML architecture, it is essential to understand the Farrow Filter and its implementation details with the AIE architecture. This understanding will lay a foundation for grasping the differences and considerations involved in the migration process.
Please study this tutorial Fractional Delay Forrow Filter Targeting AIE Architecture to understand the following:
What is a Farrow Filter?
Requirements and AIE System Partitioning
AI Engine Implementation and Optimization
Now that you have familiarized yourself with the Farrow Filter and its implementation in the AIE architecture, you are ready to migrate the farrow filter to the AIE-ML architecture.
The design requirements are identical here as you are simply migrating the design to AIE-ML architecture:
Requirements | |
---|---|
Sampling rate | 1 GSPS |
I/O data type | cint16 |
Coefficients data type | int16 |
Delay input data type | int16 |
IMPORTANT: Before beginning the tutorial, make sure that you have read and followed the Vitis Software Platform Release Notes (v2024.1) for setting up the software and installing the VEK280 base platform.
Before starting this tutorial, run the following steps:
Set up your platform by running the
xilinx-versal-common-v2024.1/environment-setup-cortexa72-cortexa53-xilinx-linux
script as provided in the platform download. This script sets up theSYSROOT
andCXX
variables. If the script is not present, you must runxilinx-versal-common-v2024.1/sdk.sh
.Set up your ROOTFS to point to the
xilinx-versal-common-v2024.1/rootfs.ext4
.Set up your IMAGE to point to
xilinx-versal-common-v2024.1/Image
.Set up your
PLATFORM_REPO_PATHS
environment variable based upon where you downloaded the platform.
Table of Contents
Objectives
Migrate the farrow filter from AIE to AIE-ML architecture
Optimize the design to meet the required performance
Modify the interface to GMIO
Write a host code with XRT APIs
Implement the design using the Vitis tool
Run the design on the board
Migrating the Design from AIE to AIE-ML Architecture
Change the Project Path
Switch the device from AIE to AIE-ML and then compile the design to ensure it compiles without errors.
Enter the following command to navigate to the project path of the final AIE design:
cd <path-to-tutorial>/designs/farrow_final_aie
Make sure to set the PLATFORM_REPO_PATHS
environment variable.
Source the Vitis Tool
Enter the following command to source the Vitis tool:
source /<TOOL_INSTALL_PATH>/Vitis/2024.1/settings.sh
Update the Makefile to switch the device from AIE to AIE-ML.
Open the Makefile and modify the device from AIE to AIE-ML as shown below:
PLATFORM_USE := xilinx_vek280_base_202410_1
Save the file.
Compile the Design for x86 Simulation
Enter the following command to compile for x86 simulation:
make x86compile
Notice the compilation error as shown below:
In file included from wrap_farrow_kernel1.cpp:2:
./../../farrow_kernel1.cpp:58:19: error: constraints not satisfied for alias template 'sliding_mul_sym_xy_ops' [with Lanes = 8, Points = 8, CoeffStep = 1, DataStepXY = 1, CoeffType = short, DataType = cint16, AccumTag = cacc48]
acc_f3 = aie::sliding_mul_sym_xy_ops<8,8,1,1,int16,cint16>::mul_antisym(f_coeffs,0,v_buff,9);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/proj/gsd/vivado/Vitis/2024.1/aietools/include/aie_api/aie.hpp:6492:14: note: because 'arch::is(arch::AIE)' evaluated to false
requires(arch::is(arch::AIE))
`
What does the compile error indicate?
The error message indicates that the AIE API sliding_mul_sym_xy_ops<> only supports the AIE architecture and not AIE-ML. You can see the error as 'arch::is(arch::AIE)' evaluated to false
Why is the AIE API sliding_mul_sym_xy_ops<> not supported for AIE-ML?
This API uses only half the tap values because it uses the pre-adder to compute the rest of the samples.
Based on the comparison provided between the AIE and AIE-ML architectures regarding fixed-point multiplication paths, it appears that the AIE architecture utilizes a pre-adder mechanism that is absent in the AIE-ML architecture.
How to fix this for AIE-ML?
Additional AIE APIs that can make full use of the tap values for computation need to be identified. One such API is aie::sliding_mul_ops<Lanes, Points, CoeffStep, DataStepXY, DataStepY, int16, cint16>;
. You should now adjust the parameter values according to the API details provided in the documentation in the this link AIE APIs Special Multiplication.
The following figure shows the supported parameters type (coeff x data) for AIE and AIE-ML architecture. coeff is int16 and data is cint16.
Initial Porting of Farrow Filter to AIE-ML
Modify the Kernel code using AIE API aie::sliding_mul_ops<>
The parameters for aie:sliding_mul_ops<> are Lanes, Points, CoeffStep, DataStepX, DataStepY, CoeffType, DataType, AccumTag.
For AIE-ML:Number of lanes are 16
Points can be 8
Accumulator is cacc64
Other parameters use the same value used for AIE architecture:CoeffStep is 1
DataStepX is 1
DataStepY is 1
CoeffType is int16
DataType is cint16
So, it will be as follows aie::sliding_mul_ops<16, 8, 1, 1, 1,int16,cint16>;
Enter the following command to navigate to the project path of the design:
cd ../farrow_port_initial
Review the kernel code located under <path-to-tutorial>/designs/farrow_port_initial/farrow_kernel1.cpp
file. The necessary changes are already made. Study the code and observe the following changes:
Accumulator size has been changed to
cacc64
(acc_f3, acc_f2, acc_f1, acc_f0) as per the AIE API.Load the full coefficient values (f_coeffs).
Vector iterator size updated for 16 lanes (p_sig_i, p_y3, p_y2, p_y2, p_y0), compared to eight lanes in AIE code.
sliding_mul API as:
aie::sliding_mul_ops< 16, 8, 1, 1, 1, int16, cint16>::mul(f_coeffs,0,v_buff,25);
Observe the four filter coefficient start location (0, 8, 16, 24) as second template parameter of aie::sliding_mul_ops<…>::mul(…).
It uses the full coefficient length.
Review the kernel code header file located under <path-to-tutorial>/designs/farrow_port_initial/farrow_kernel1.h
file.
f_taps
has full coefficient valuesTT_ACC
has been udpated forcacc64
No changes to the farrow_kernel2.cpp
file.
After finishing the review of the kernel code, proceed to compile and then simulate the design.
Compile and Simulate the Design
Enter the following command to compile (x86compile) and simulate (x86sim) to verify the functional correctness of the design:
$ make x86compile
$ make x86sim
The first command compiles the graph code for simulation on an x86 processor, the second command runs the simulation.
To verify the results, make sure you have already invoked MATLAB in your command line and run the following command:
$ make check_sim_output_x86
This command invokes MATLAB to compare the simulator output against golden test vectors.
The console should output Max error LSB = 1
.
To understand the performance of your initial implementation, you can perform AI Engine emulation using the SystemC simulator by entering the following sequence of commands:
$ make compile
$ make sim
$ make check_sim_output_aie
The first command compiles graph code for the SystemC simulator, the second command runs the AIE simulation, and the final command invokes MATLAB to compare the simulation output with test vectors and compute raw throughput. The average throughput for the IO ports is displayed at the end of AIE simulation. After the final command execution, the console should output as below:
Raw Throughput = 415.7 MSPS
Max error LSB = 1
Analyze the Reports
Enter the following command to launch the Vitis Analyzer and review the reports.
$ vitis_analyzer aiesimulator_output/default.aierun_summary
Select the Graph
view.