The AI Engine compiler supports C++ kernel classes. The following example shows how to set filter coefficients and the number of samples of a FIR filter class through a constructor. The C++ kernel class allows class objects to encapsulate the internal states for each corresponding kernel.
The following code provides an example of this where the filter
coefficients (coeffs) are specified through the
constructor. This resolves the problem of using file scope variable, global
variable, or static function scope variable to store the internal states of a C
function kernel. When multiple instances of such a kernel are mapped to the same
core, the internal state variables are shared across multiple instances and cause
conflicts.
//fir.h
#pragma once
#include "adf.h"
#define NUM_COEFFS 12
class FIR
{
private:
int32 coeffs[NUM_COEFFS];
int32 tapDelayLine[NUM_COEFFS];
uint32 numSamples;
public:
FIR(const int32(&coefficients)[NUM_COEFFS], uint32 samples);
void filter(adf::input_buffer<int32> &in, adf::output_buffer<int32> &out);
static void registerKernelClass()
{
REGISTER_FUNCTION(FIR::filter);
}
};
You must write the static void
registerKernelClass() method in the header file. Inside the registerKernelClass() method, you need to call the
REGISTER_FUNCTION macro. This macro registers the class run method to be executed on the AI Engine-ML core to perform the kernel functionality.
In the preceding example FIR::filter
is registered using this macro. Implement the kernel class constructor and run
method inside a separate source file. The implementation of a run method of a kernel class is the same as writing a
kernel function described in previous chapters.
//fir.cpp
//implementation in this example is not optimized and is for illustration purpose
#include "fir.h"
#include <aie_api/aie.hpp>
FIR::FIR(const int32(&coefficients)[NUM_COEFFS], uint32 samples)
{
for (int i = 0; i < NUM_COEFFS; i++)
coeffs[i] = coefficients[i];
for (int i = 0; i < NUM_COEFFS; i++)
tapDelayLine[i] = 0;
numSamples = samples;
}
void FIR::filter(adf::input_buffer<int32> &in, adf::output_buffer<int32> &out){
auto inIter=aie::begin(in);
auto outIter=aie::begin(out);
for (int i = 0; i < numSamples; i++){
for (int j = NUM_COEFFS-1; j > 0; j--){
tapDelayLine[j] = tapDelayLine[j - 1];
}
tapDelayLine[0] = *inIter++;
int32 y = 0;
for (int j = 0; j < NUM_COEFFS; j++){
y += coeffs[j] * tapDelayLine[j];
}
*outIter++=y;
}
}
//graph.h
#pragma once
#include "adf.h"
#include "fir.h"
class mygraph : public graph
{
public:
adf::input_plio in1, in2;
adf::output_plio out1, out2;
adf::kernel k1, k2;
mygraph(){
in1=adf::input_plio::create("Datain1",adf::plio_32_bits,"data/input1.txt");
in2=adf::input_plio::create("Datain2",adf::plio_32_bits,"data/input2.txt");
out1=adf::output_plio::create("Dataout1",adf::plio_32_bits,"data/output1.txt");
out2=adf::output_plio::create("Dataout2",adf::plio_32_bits,"data/output2.txt");
k1 = adf::kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80,
-391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
adf::runtime<adf::ratio>(k1) = 0.9;
adf::source(k1) = "aie/fir.cpp";
k2 = adf::kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319,
-78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
adf::runtime<adf::ratio>(k2) = 0.9;
adf::source(k2) = "aie/fir.cpp";
adf::connect(in1.out[0], k1.in[0]);
adf::connect(in2.out[0], k2.in[0]);
adf::connect(k1.out[0], out1.in[0]);
adf::connect(k2.out[0], out2.in[0]);
adf::dimensions(k1.in[0])={8};
adf::dimensions(k2.in[0])={8};
adf::dimensions(k1.out[0])={8};
adf::dimensions(k2.out[0])={8};
}
};
For a kernel class with a non-default constructor, you can specify the
constructor parameter values in the arguments of kernel::create_object, when creating a representation of a kernel
instance. In the previous example, two FIR filter kernels (k1 and k2) are created using kernel::create_object<FIR>. k1 has filter coefficients { 180, 89, -80, -391, -720,
-834, -478, 505, 2063, 3896, 5535, 6504 } and k2
has filter coefficients { -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754,
-1066, 18539 }. Both of them consume eight samples for each invocation.
The following code was generated by the AI Engine compiler. It instantiates the two FIR kernel objects with the proper constructor parameters.
//Work/aie/<COL_ROW>/src/<COL_ROW>.cc
...
FIR i4({180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504}, 8);
FIR i5({-21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539}, 8);
int main(void) {
...
// Kernel call : i4:filter
i4.filter(window_buf0_buf0d_i[0],window_buf2_buf2d_o[0]);
...
// Kernel call : i5:filter
i5.filter(window_buf1_buf1d_i[0],window_buf3_buf3d_o[0]);
...
}
A kernel class can have a member variable occupying a significant
amount of memory space that might not fit into data memory. The location of the
kernel class member variable can be controlled. The AI Engine compiler supports array
reference member variables that allow the compiler to allocate or
constrain the memory space while passing the reference to the object.
//fir.h
#pragma once
#include "adf.h"
#define NUM_COEFFS 12
class FIR
{
private:
int32 (&coeffs)[NUM_COEFFS];
int32 tapDelayLine[NUM_COEFFS];
uint32 numSamples;
public:
FIR(int32(&coefficients)[NUM_COEFFS], uint32 samples);
void filter(adf::input_buffer<int32> &in, adf::output_buffer<int32> &out);
static void registerKernelClass()
{
REGISTER_FUNCTION(FIR::filter);
REGISTER_PARAMETER(coeffs);
}
};
//fir.cpp
#include "fir.h"
FIR::FIR(int32(&coefficients)[NUM_COEFFS], uint32 samples)
: coeffs(coefficients)
{
for (int i = 0; i < NUM_COEFFS; i++)
tapDelayLine[i] = 0;
numSamples = samples;
}
void FIR::filter(adf::input_buffer<int32> &in, adf::output_buffer<int32> &out)
{
...
}
The previous example shows a slightly modified version of the FIR
kernel class. Here, member variable coeffs is a
int32 (&)[NUM_COEFFS] data type. The
constructor initializer coeffs(coefficients)
initializes coeffs to the reference to an array
allocated externally to the class object. To let the AI Engine compiler know that the coeffs member variable can be relocated in the mapper stage of the
compilation, you must use REGISTER_PARAMETER to
register an array reference member variable inside the registerKernelClass.
Using kernel::create_object creates
a representation of a FIR kernel instance and specifies the initial value of the
constructor parameters. This the same as in the previous example. See the following
code.
//graph.h
...
class mygraph : public adf::graph
{
...
mygraph()
{
k1 = adf::kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
...
k2 = adf::kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
...
}
};
The following code was generated by the AI Engine compiler. The memory spaces for int32 i4_coeffs[12] and int32
i5_coeffs[15] are outside the kernel object instances and are passed
into the FIR objects by reference.
//Work/aie/<COL_ROW>/src/<COL_ROW>.cc
int32 i4_coeffs[12] = {180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504};
FIR i4(i4_coeffs, 8);
int32 i5_coeffs[12] = {-21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539};
FIR i5(i5_coeffs, 8);
int main(void) {
...
// Kernel call : i4:filter
i4.filter(window_buf0_buf0d_i[0],window_buf2_buf2d_o[0]);
...
// Kernel call : i5:filter
i5.filter(window_buf1_buf1d_i[0],window_buf3_buf3d_o[0]);
...
}
Because the memory space for an array reference member variable is
allocated by the AI Engine compiler, the
location constraint can be applied to constrain the memory location of these arrays,
as shown in the following example code. The REGISTER_PARAMETER macro allows kernel::create_object to create a parameter handle for an array
reference member variable, like k1.param[0] and
k2.param[0], and the location<parameter> constraint can be applied.
//graph.h
...
class mygraph : public adf::graph
{
...
mygraph()
{
k1 = adf::kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
...
k2 = adf::kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
...
adf::location<adf::parameter>(k1.param[0]) = adf::address(…);
adf::location<adf::parameter>(k2.param[0]) = adf::bank(…);
}
};
The C++ kernel class header files and the C++ kernel function template (see C++ Template Support) must not contain single-core specific intrinsic APIs and pragmas. This is the same programming guideline as writing regular C function kernels. This is because these header files are included in the graph header file and can be cross-compiled as part of the PS program. The Arm® cross-compiler cannot understand single-core intrinsic APIs or pragmas. You must keep single-core specific programming content inside the source files.