The AI Engine compiler supports C++
kernel classes. The following example shows how to set filter coefficients and the
number of samples of a FIR filter class through a constructor. The C++ kernel class
allows internal states for each kernel instance to be encapsulated within the
corresponding class object. In the following code, you can see an example of this
where the filter coefficients (coeffs
) are
specified through the constructor. This resolves the problem of using file scope
variable, global variable, or static function scope variable to store the internal
states of a C function kernel. When multiple instances of such a kernel are mapped
to the same core, the internal state variables are shared across multiple instances
and cause conflicts.
//fir.h
#pragma once
#include "adf.h"
#define NUM_COEFFS 12
class FIR
{
private:
int32 coeffs[NUM_COEFFS];
int32 tapDelayLine[NUM_COEFFS];
uint32 numSamples;
public:
FIR(const int32(&coefficients)[NUM_COEFFS], uint32 samples);
void filter(input_window_int32* in, output_window_int32* out);
static void registerKernelClass()
{
REGISTER_FUNCTION(FIR::filter);
}
};
You are required to write the static void
registerKernelClass()
method in the header file. Inside the registerKernelClass()
method, you need to call the
REGISTER_FUNCTION macro. This macro is used to register the class run
method to be executed on the AI Engine core to perform the kernel functionality. In
the preceding example FIR::filter
is registered
using this macro. The kernel class constructor and run method should be implemented
inside a separate source file. The implementation of a run
method of a kernel class is the same as writing a kernel function
described in previous chapters.
//fir.cpp
//implementation in this example is not optimized and is for illustration purpose
#include "fir.h"
FIR::FIR(const int32(&coefficients)[NUM_COEFFS], uint32 samples)
{
for (int i = 0; i < NUM_COEFFS; i++)
coeffs[i] = coefficients[i];
for (int i = 0; i < NUM_COEFFS; i++)
tapDelayLine[i] = 0;
numSamples = samples;
}
void FIR::filter(input_window_int32* in, output_window_int32* out)
{
for (int i = 0; i < numSamples; i++)
{
for (int j = NUM_COEFFS-1; j > 0; j--)
tapDelayLine[j] = tapDelayLine[j - 1];
tapDelayLine[0] = window_readincr(in);
int32 y = 0;
for (int j = 0; j < NUM_COEFFS; j++)
{
y += coeffs[j] * tapDelayLine[j];
}
window_writeincr(out, y);
}
}
//graph.h
#pragma once
#include "adf.h"
#include "fir.h"
using namespace adf;
class mygraph : public graph
{
public:
input_port in1, in2;
output_port out1, out2;
kernel k1, k2;
mygraph()
{
//see lab8.3 for narrow filter coefficients
k1 = kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
runtime<ratio>(k1) = 0.1;
source(k1) = "src/fir.cpp";
//see lab8.3 for wide filter coefficients
k2 = kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
runtime<ratio>(k2) = 0.1;
source(k2) = "src/fir.cpp";
connect<window<32>>(in1, k1.in[0]);
connect<window<32>>(in2, k2.in[0]);
connect<window<32>>(k1.out[0], out1);
connect<window<32>>(k2.out[0], out2);
}
};
For a kernel class with a non-default constructor, you can specify the
constructor parameter values in the arguments of kernel::create_object
, when creating a representation of a kernel
instance. In the previous example, two FIR filter kernels (k1
and k2
) are created using kernel::create_object<FIR>
. k1
has filter coefficients { 180, 89, -80, -391, -720,
-834, -478, 505, 2063, 3896, 5535, 6504 } and k2
has filter coefficients { -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754,
-1066, 18539 }. Both of them consume eight samples for each invocation.
The following code shows the AI Engine compiler generated program. The two FIR kernel objects are instantiated with the proper constructor parameters.
//Work/aie/x_y/src/x_y.cc
...
FIR i4({180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504}, 8);
FIR i5({-21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539}, 8);
int main(void) {
...
// Kernel call : i4:filter
i4.filter(get_input_window_int32(window_buf0_buf0d),get_output_window_int32(window_buf2_buf2d));
...
// Kernel call : i5:filter
i5.filter(get_input_window_int32(window_buf1_buf1d),get_output_window_int32(window_buf3_buf3d));
...
}
A kernel class may have a member variable occupying a significant
amount of memory space that might not fit into data memory. The location of the
kernel class member variable can be controlled. The AI Engine compiler supports array
reference
member variables that allow the compiler to allocate or
constrain the memory space while passing the reference to the object.
//fir.h
#pragma once
#include "adf.h"
#define NUM_COEFFS 12
class FIR
{
private:
int32 (&coeffs)[NUM_COEFFS];
int32 tapDelayLine[NUM_COEFFS];
uint32 numSamples;
public:
FIR(int32(&coefficients)[NUM_COEFFS], uint32 samples);
void filter(input_window_int32* in, output_window_int32* out);
static void registerKernelClass()
{
REGISTER_FUNCTION(FIR::filter);
REGISTER_PARAMETER(coeffs);
}
};
//fir.cpp
#include "fir.h"
FIR::FIR(int32(&coefficients)[NUM_COEFFS], uint32 samples)
: coeffs(coefficients)
{
for (int i = 0; i < NUM_COEFFS; i++)
tapDelayLine[i] = 0;
numSamples = samples;
}
void FIR::filter(input_window_int32* in, output_window_int32* out)
{
...
}
The previous example shows a slightly modified version of the FIR
kernel class. Here, member variable coeffs
is a
int32 (&)[NUM_COEFFS]
data type. The
constructor initializer coeffs(coefficients)
initializes coeffs
to the reference to an array
allocated externally to the class object. To let the AI Engine compiler know that the coeffs
member variable may be relocated in the mapper stage of the
compilation, you must use REGISTER_PARAMETER
to
register an array reference member variable inside the registerKernelClass
The use of kernel::create_object
to
create a representation of a FIR kernel instance and to specify the initial value of
the constructor parameters is the same as in the previous example. See the following
code.
//graph.h
...
class mygraph : public graph
{
...
mygraph()
{
k1 = kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
...
k2 = kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
...
}
};
The following code shows the corresponding AI Engine compiler generated program. The memory spaces for int32 i4_coeffs[12]
and int32
i5_coeffs[15]
are outside the kernel object instances and are passed
into the FIR objects by reference.
//Work/aie/x_y/src/x_y.cc
int32 i4_coeffs[12] = {180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504};
FIR i4(i4_coeffs, 8);
int32 i5_coeffs[12] = {-21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539};
FIR i5(i5_coeffs, 8);
int main(void) {
...
// Kernel call : i4:filter
i4.filter(get_input_window_int32(window_buf0_buf0d),get_output_window_int32(window_buf2_buf2d));
...
// Kernel call : i5:filter
i5.filter(get_input_window_int32(window_buf1_buf1d),get_output_window_int32(window_buf3_buf3d));
...
}
Because the memory space for an array reference member variable is
allocated by the AI Engine compiler, the
location constraint can be applied to constrain the memory location of these arrays,
as shown in the following example code. The REGISTER_PARAMETER
macro allows kernel::create_object
to create a parameter handle for an array
reference member variable, like k1.param[0]
and
k2.param[0]
, and the location<parameter>
constraint can be applied.
//graph.h
...
class mygraph : public graph
{
...
mygraph()
{
k1 = kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
...
k2 = kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
...
location<parameter>(k1.param[0]) = address(…);
location<parameter>(k2.param[0]) = bank(…);
}
};
The C++ kernel class header files and the C++ kernel function template (see C++ Template Support) should not contain single-core specific intrinsic APIs and pragmas. This is the same programming guideline as writing regular C function kernels. This is because these header files are included in the graph header file and can be cross-compiled as part of the PS program. The Arm® cross-compiler cannot understand single-core intrinsic APIs or pragmas. Single-core specific programming content must be kept inside the source files.