The AI Engine compiler supports C++ kernel classes. The following
example shows how to set filter coefficients and the number of samples of a FIR
filter class through a constructor. The C++ kernel class allows internal states for
each kernel instance to be encapsulated within the corresponding class object. In
the following code, you can see an example of this where the filter coefficients
(coeffs
) are specified through the constructor.
This resolves the problem of using file scope variable, global variable, or static
function scope variable to store the internal states of a C function kernel. When
multiple instances of such a kernel are mapped to the same core, the internal state
variables are shared across multiple instances and cause conflicts.
//fir.h
#pragma once
#include "adf.h"
#define NUM_COEFFS 12
using namespace adf;
class FIR
{
private:
int32 coeffs[NUM_COEFFS];
int32 tapDelayLine[NUM_COEFFS];
uint32 numSamples;
public:
FIR(const int32(&coefficients)[NUM_COEFFS], uint32 samples);
void filter(input_buffer<int32> &in, output_buffer<int32> &out);
static void registerKernelClass()
{
REGISTER_FUNCTION(FIR::filter);
}
};
You are required to write the static void
registerKernelClass()
method in the header file. Inside the registerKernelClass()
method, you need to call the
REGISTER_FUNCTION macro. This macro is used to register the class run
method to be executed on the AI Engine core to perform the kernel functionality.
In the preceding example FIR::filter
is registered
using this macro. The kernel class constructor and run method should be implemented
inside a separate source file. The implementation of a run
method of a kernel class is the same as writing a kernel function
described in previous chapters.
//fir.cpp
//implementation in this example is not optimized and is for illustration purpose
#include "fir.h"
#include <aie_api/aie.hpp>
#include <aie_api/aie_adf.hpp>
FIR::FIR(const int32(&coefficients)[NUM_COEFFS], uint32 samples)
{
for (int i = 0; i < NUM_COEFFS; i++)
coeffs[i] = coefficients[i];
for (int i = 0; i < NUM_COEFFS; i++)
tapDelayLine[i] = 0;
numSamples = samples;
}
void FIR::filter(input_buffer<int32> &in, output_buffer<int32> &out){
auto inIter=aie::begin(in);
auto outIter=aie::begin(out);
for (int i = 0; i < numSamples; i++){
for (int j = NUM_COEFFS-1; j > 0; j--){
tapDelayLine[j] = tapDelayLine[j - 1];
}
tapDelayLine[0] = *inIter++;
int32 y = 0;
for (int j = 0; j < NUM_COEFFS; j++){
y += coeffs[j] * tapDelayLine[j];
}
*outIter++=y;
}
}
//graph.h
#pragma once
#include "adf.h"
#include "fir.h"
using namespace adf;
class mygraph : public graph
{
public:
input_plio in1, in2;
output_plio out1, out2;
kernel k1, k2;
mygraph(){
in1=input_plio::create("Datain1",plio_32_bits,"data/input1.txt");
in2=input_plio::create("Datain2",plio_32_bits,"data/input2.txt");
out1=output_plio::create("Dataout1",plio_32_bits,"data/output1.txt");
out2=output_plio::create("Dataout2",plio_32_bits,"data/output2.txt");
k1 = kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80,
-391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
runtime<ratio>(k1) = 0.9;
source(k1) = "aie/fir.cpp";
k2 = kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319,
-78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
runtime<ratio>(k2) = 0.9;
source(k2) = "aie/fir.cpp";
connect(in1.out[0], k1.in[0]);
connect(in2.out[0], k2.in[0]);
connect(k1.out[0], out1.in[0]);
connect(k2.out[0], out2.in[0]);
dimensions(k1.in[0])={8};
dimensions(k2.in[0])={8};
dimensions(k1.out[0])={8};
dimensions(k2.out[0])={8};
}
};
For a kernel class with a non-default constructor, you can specify the
constructor parameter values in the arguments of kernel::create_object
, when creating a representation of a kernel
instance. In the previous example, two FIR filter kernels (k1
and k2
) are created using kernel::create_object<FIR>
. k1
has filter coefficients { 180, 89, -80, -391, -720,
-834, -478, 505, 2063, 3896, 5535, 6504 } and k2
has filter coefficients { -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754,
-1066, 18539 }. Both of them consume eight samples for each invocation.
The following code was generated by the AI Engine compiler. The two FIR kernel objects are instantiated with the proper constructor parameters.
//Work/aie/<COL_ROW>/src/<COL_ROW>.cc
...
FIR i4({180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504}, 8);
FIR i5({-21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539}, 8);
int main(void) {
...
// Kernel call : i4:filter
i4.filter(window_buf0_buf0d_i[0],window_buf2_buf2d_o[0]);
...
// Kernel call : i5:filter
i5.filter(window_buf1_buf1d_i[0],window_buf3_buf3d_o[0]);
...
}
A kernel class may have a member variable occupying a significant
amount of memory space that might not fit into data memory. The location of the
kernel class member variable can be controlled. The AI Engine compiler supports
array reference
member variables that allow the
compiler to allocate or constrain the memory space while passing the reference to
the object.
//fir.h
#pragma once
#include "adf.h"
#define NUM_COEFFS 12
using namespace adf;
class FIR
{
private:
int32 (&coeffs)[NUM_COEFFS];
int32 tapDelayLine[NUM_COEFFS];
uint32 numSamples;
public:
FIR(int32(&coefficients)[NUM_COEFFS], uint32 samples);
void filter(input_buffer<int32> &in, output_buffer<int32> &out);
static void registerKernelClass()
{
REGISTER_FUNCTION(FIR::filter);
REGISTER_PARAMETER(coeffs);
}
};
//fir.cpp
#include "fir.h"
FIR::FIR(int32(&coefficients)[NUM_COEFFS], uint32 samples)
: coeffs(coefficients)
{
for (int i = 0; i < NUM_COEFFS; i++)
tapDelayLine[i] = 0;
numSamples = samples;
}
void FIR::filter(input_buffer<int32> &in, output_buffer<int32> &out)
{
...
}
The previous example shows a slightly modified version of the FIR
kernel class. Here, member variable coeffs
is a
int32 (&)[NUM_COEFFS]
data type. The
constructor initializer coeffs(coefficients)
initializes coeffs
to the reference to an array
allocated externally to the class object. To let the AI Engine compiler know that
the coeffs
member variable may be relocated in the
mapper stage of the compilation, you must use REGISTER_PARAMETER
to register an array reference member variable
inside the registerKernelClass
.
The use of kernel::create_object
to
create a representation of a FIR kernel instance and to specify the initial value of
the constructor parameters is the same as in the previous example. See the following
code.
//graph.h
...
class mygraph : public graph
{
...
mygraph()
{
k1 = kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
...
k2 = kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
...
}
};
The following code was generated by the AI Engine compiler. The
memory spaces for int32 i4_coeffs[12]
and int32 i5_coeffs[15]
are outside the kernel object
instances and are passed into the FIR objects by reference.
//Work/aie/<COL_ROW>/src/<COL_ROW>.cc
int32 i4_coeffs[12] = {180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504};
FIR i4(i4_coeffs, 8);
int32 i5_coeffs[12] = {-21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539};
FIR i5(i5_coeffs, 8);
int main(void) {
...
// Kernel call : i4:filter
i4.filter(window_buf0_buf0d_i[0],window_buf2_buf2d_o[0]);
...
// Kernel call : i5:filter
i5.filter(window_buf1_buf1d_i[0],window_buf3_buf3d_o[0]);
...
}
Because the memory space for an array reference member variable is
allocated by the AI Engine compiler, the location constraint can be applied to
constrain the memory location of these arrays, as shown in the following example
code. The REGISTER_PARAMETER
macro allows kernel::create_object
to create a parameter handle for
an array reference member variable, like k1.param[0]
and k2.param[0]
, and the
location<parameter>
constraint can be
applied.
//graph.h
...
class mygraph : public graph
{
...
mygraph()
{
k1 = kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
...
k2 = kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
...
location<parameter>(k1.param[0]) = address(…);
location<parameter>(k2.param[0]) = bank(…);
}
};
The C++ kernel class header files and the C++ kernel function template (see C++ Template Support) should not contain single-core specific intrinsic APIs and pragmas. This is the same programming guideline as writing regular C function kernels. This is because these header files are included in the graph header file and can be cross-compiled as part of the PS program. The Arm® cross-compiler cannot understand single-core intrinsic APIs or pragmas. Single-core specific programming content must be kept inside the source files.