Version: Vitis 2024.2
Recap
In Part 1a, we calculated eight outputs of a second-order section of an IIR filter in parallel using the AI Engine.
This section shows how to modify the ADF graph and the testbench for a sixth-order elliptical lowpass filter.
Julia Script
aie_iir_1b.jl
is a Julia script which can design and generate the required SIMD coefficients for an arbitrary IIR filter.
In its pristine state, the user parameters are as follows:
first_set = true; # true: 1st set; false: 2nd set
if first_set
fp = 10.0e6 # passband frequency
coeff_file_suffix = "a" # file suffix to distinguish different coefficient sets
# with the same architecture
else
fp = 20.0e6 # passband frequency
coeff_file_suffix = "b" # 2nd set of coefficients
end # if first_set
fs = 100.0e6 # sampling frequency
p = 6 # no. of poles
rp = 0.1 # passband ripple (dB)
rs = 80.0 # stopband attenuation (dB)
N = 256 # no. of samples for impulse response
show_plots = true # display plots?
write_cmatrix = true # write C matrix to files?
write_impulse = true # write impulse response?
Start Julia and run the script using the following commands:
julia> cd("{specify_path_to_aie_iir_1b.jl}")
julia> include("aie_iir_1b.jl")
This plots the filter characteristics and generates three coefficient files (C[1~3]a.h), as there are six poles in the filter.
The kernel code remains the same, but the ADF graph and testbench must be slightly modified to accommodate the additional sections.
Move the generated *.dat
files into the data
directory and the generated *.h
files (coefficient files) into the src
directory.
Adaptive Dataflow (ADF) Graph
The modified ADF graph looks like this:
graph.hpp
#ifndef __GRAPH_H__ // include guard to prevent multiple inclusion
#define __GRAPH_H__
#include <adf.h> // Adaptive DataFlow header
#include "kernel.hpp"
using namespace adf;
// dataflow graph declaration
class the_graph : public graph { // inherit all properties of the adaptive dataflow graph
private:
kernel section1;
kernel section2;
kernel section3;
public:
input_plio in; // input port for data to enter the kernel
input_port cmtx1; // input port for SIMD matrix coefficients
input_port cmtx2;
input_port cmtx3;
output_plio out; // output port for data to leave the kernel
// constructor
the_graph() {
// associate the kernel with the function to be executed
section1 = kernel::create(SecondOrderSection<1>);
section2 = kernel::create(SecondOrderSection<2>);
section3 = kernel::create(SecondOrderSection<3>);
// declare data widths and files for simulation
#ifndef RTP_SWITCH
in = input_plio::create(plio_32_bits, "data/input.dat");
#else
in = input_plio::create(plio_32_bits, "data/two_freqs.dat");
#endif // RTP_SWITCH
out = output_plio::create(plio_32_bits, "output.dat");
const unsigned num_samples = 8;
// declare buffer sizes
dimensions(section1.in[0]) = {num_samples};
dimensions(section1.out[0]) = {num_samples};
dimensions(section2.in[0]) = {num_samples};
dimensions(section2.out[0]) = {num_samples};
dimensions(section3.in[0]) = {num_samples};
dimensions(section3.out[0]) = {num_samples};
// establish connections
connect(in.out[0], section1.in[0]);
connect<parameter>(cmtx1, adf::async(section1.in[1]));
connect(section1.out[0], section2.in[0]);
connect<parameter>(cmtx2, adf::async(section2.in[1]));
connect(section2.out[0], section3.in[0]);
connect<parameter>(cmtx3, adf::async(section3.in[1]));
connect(section3.out[0], out.in[0]);
// specify which source code file contains the kernel function
source(section1) = "kernel.cpp";
source(section2) = "kernel.cpp";
source(section3) = "kernel.cpp";
// !!! temporary value: assumes this kernel dominates the AI Engine tile !!!
runtime<ratio>(section1) = 1.0;
runtime<ratio>(section2) = 1.0;
runtime<ratio>(section3) = 1.0;
} // end the_graph()
}; // end class the_graph
#endif // __GRAPH_H__
Notes:
Two additional kernels (sections) are added.
Two additional
input_port
declarations for the coefficients of the new sections are added.Two additional
kernel::create()
statements are added.The network topology are modified such that the three sections are cascaded.
Additional
source()
andruntime<ratio>()
statements are added.
Testbench Code
The testbench may look something like this:
tb.cpp
//#define RTP_SWITCH // comment out to check impulse response
#include "kernel.hpp"
#include "graph.hpp"
#include "C1a.h"
#include "C2a.h"
#include "C3a.h"
#ifdef RTP_SWITCH
#include "C1b.h"
#include "C2b.h"
#include "C3b.h"
#endif // RTP_SWITCH
using namespace std;
using namespace adf;
// specify the DFG
the_graph my_graph;
const unsigned num_pts = 256; // number of sample points in "input.dat"
#ifndef RTP_SWITCH
const unsigned num_iterations = num_pts/8; // number of iterations to run
#else
const unsigned num_iterations = num_pts/8/2; // number of iterations to run
#endif // RTP_SWITCH
// main simulation program
int main() {
my_graph.init(); // load the DFG into the AI Engine array, establish connectivity, etc.
my_graph.update(my_graph.cmtx1, C1a, 96);
my_graph.update(my_graph.cmtx2, C2a, 96);
my_graph.update(my_graph.cmtx3, C3a, 96);
#ifndef RTP_SWITCH
my_graph.run(num_iterations); // run the DFG for the specified number of iterations
#else
// run with set "a" for 1st half
my_graph.run(num_iterations); // run the DFG for the specified number of iterations
my_graph.wait(); // wait for all iterations to finish before loading new coefficients
// load set "b"
my_graph.update(my_graph.cmtx1, C1b, 96);
my_graph.update(my_graph.cmtx2, C2b, 96);
my_graph.update(my_graph.cmtx3, C3b, 96);
my_graph.run(num_iterations); // run the DFG for the specified number of iterations
#endif // RTP_SWITCH
my_graph.end(); // housekeeping
return (0);
} // end main()
Notes:
With
#define RTP_SWITCH
commented out, only one set of coefficients is loaded and we can check the impulse response as before.An additional set of coefficients is included for the other sections.
Additional
my_graph.update()
statements are added for the other sections.
Building and Running the Design
We use Emulation-SW
to verify the functionality of the design. Similar to Part1a, we can use Julia to calculate the maximum value of the absolute error between the reference and the generated impulse response (run check1.jl
). The impulse response error for this three-stage design is as follows:
Changing Coefficients During Runtime
We set first_set
to false
in aie_iir_1b.jl
as follows to generate another set of coefficients for an LPF with a passband of 20 MHz.
first_set = false; # true: 1st set; false: 2nd set
...
The frequency response with a 20MHz passband is as follows:
Move the generated *.h
(coefficient files) to src
and impresponse_b.dat
to data
.
We use two_freqs.jl
to generate an input signal (two_freqs.dat
) with two frequencies (f1 = 2 MHz, f2 = 18 MHz) to test the functionality of coefficient switching. The time and frequency domain plots of the signal are as follows:
Move the generated two_freqs.dat
to data
.
In the testbench (tb.cpp
), we uncomment #define RTP_SWITCH
to include the second set of coefficients.
Note: A wait()
statement is required to allow the specified number of iterations to complete before loading the new set of coefficients.
The output of the AI Engines is as follows (use check2.jl
):