C++ Kernel Class Support - 2024.1 English

AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID
UG1079
Release Date
2024-06-05
Version
2024.1 English

The AI Engine compiler supports C++ kernel classes. The following example shows how to set filter coefficients and the number of samples of a FIR filter class through a constructor. The C++ kernel class allows internal states for each kernel instance to be encapsulated within the corresponding class object. In the following code, you can see an example of this where the filter coefficients (coeffs) are specified through the constructor. This resolves the problem of using file scope variable, global variable, or static function scope variable to store the internal states of a C function kernel. When multiple instances of such a kernel are mapped to the same core, the internal state variables are shared across multiple instances and cause conflicts.

//fir.h
#pragma once
#include "adf.h"
#define NUM_COEFFS 12
using namespace adf;
class FIR
{
private:
    int32 coeffs[NUM_COEFFS];
    int32 tapDelayLine[NUM_COEFFS];
    uint32 numSamples;

public:
    FIR(const int32(&coefficients)[NUM_COEFFS], uint32 samples);
    void filter(input_buffer<int32> &in, output_buffer<int32> &out);
    static void registerKernelClass()
    {
        REGISTER_FUNCTION(FIR::filter);
    }
};

You are required to write the static void registerKernelClass() method in the header file. Inside the registerKernelClass() method, you need to call the REGISTER_FUNCTION macro. This macro is used to register the class run method to be executed on the AI Engine core to perform the kernel functionality. In the preceding example FIR::filter is registered using this macro. The kernel class constructor and run method should be implemented inside a separate source file. The implementation of a run method of a kernel class is the same as writing a kernel function described in previous chapters.

//fir.cpp
//implementation in this example is not optimized and is for illustration purpose
#include "fir.h"
#include <aie_api/aie.hpp>
#include <aie_api/aie_adf.hpp>

FIR::FIR(const int32(&coefficients)[NUM_COEFFS], uint32 samples)
{
    for (int i = 0; i < NUM_COEFFS; i++)
        coeffs[i] = coefficients[i];

    for (int i = 0; i < NUM_COEFFS; i++)
        tapDelayLine[i] = 0;

    numSamples = samples;
}

void FIR::filter(input_buffer<int32> &in, output_buffer<int32> &out){
  auto inIter=aie::begin(in);
  auto outIter=aie::begin(out);
  for (int i = 0; i < numSamples; i++){
    for (int j = NUM_COEFFS-1; j > 0; j--){
      tapDelayLine[j] = tapDelayLine[j - 1];
    }
    tapDelayLine[0] = *inIter++;
    int32 y = 0;
    for (int j = 0; j < NUM_COEFFS; j++){
      y += coeffs[j] * tapDelayLine[j];
    }
    *outIter++=y;
  }
}
//graph.h
#pragma once
#include "adf.h"
#include "fir.h"
using namespace adf;
class mygraph : public graph
{
  public:
    input_plio in1, in2;
    output_plio out1, out2;
    kernel k1, k2;
    mygraph(){
      in1=input_plio::create("Datain1",plio_32_bits,"data/input1.txt");
      in2=input_plio::create("Datain2",plio_32_bits,"data/input2.txt");
      out1=output_plio::create("Dataout1",plio_32_bits,"data/output1.txt");
      out2=output_plio::create("Dataout2",plio_32_bits,"data/output2.txt");
      k1 = kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80,
-391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
      runtime<ratio>(k1) = 0.9;
      source(k1) = "aie/fir.cpp";
      k2 = kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319,
-78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
      runtime<ratio>(k2) = 0.9;
      source(k2) = "aie/fir.cpp";

      connect(in1.out[0], k1.in[0]);
      connect(in2.out[0], k2.in[0]);
      connect(k1.out[0], out1.in[0]);
      connect(k2.out[0], out2.in[0]);

      dimensions(k1.in[0])={8};
      dimensions(k2.in[0])={8};
      dimensions(k1.out[0])={8};
      dimensions(k2.out[0])={8};
    }
};

For a kernel class with a non-default constructor, you can specify the constructor parameter values in the arguments of kernel::create_object, when creating a representation of a kernel instance. In the previous example, two FIR filter kernels (k1 and k2) are created using kernel::create_object<FIR>. k1 has filter coefficients { 180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 } and k2 has filter coefficients { -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }. Both of them consume eight samples for each invocation.

The following code was generated by the AI Engine compiler. The two FIR kernel objects are instantiated with the proper constructor parameters.

//Work/aie/<COL_ROW>/src/<COL_ROW>.cc
...
FIR i4({180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504}, 8);
FIR i5({-21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539}, 8);

int main(void) {
    ...
    // Kernel call : i4:filter
      i4.filter(window_buf0_buf0d_i[0],window_buf2_buf2d_o[0]);
    ...
    // Kernel call : i5:filter
      i5.filter(window_buf1_buf1d_i[0],window_buf3_buf3d_o[0]);
    ...
}

A kernel class may have a member variable occupying a significant amount of memory space that might not fit into data memory. The location of the kernel class member variable can be controlled. The AI Engine compiler supports array reference member variables that allow the compiler to allocate or constrain the memory space while passing the reference to the object.

//fir.h
#pragma once
#include "adf.h"
#define NUM_COEFFS 12
using namespace adf;
class FIR
{
private:
    int32 (&coeffs)[NUM_COEFFS];
    int32 tapDelayLine[NUM_COEFFS];
    uint32 numSamples;

public:
    FIR(int32(&coefficients)[NUM_COEFFS], uint32 samples);
    void filter(input_buffer<int32> &in, output_buffer<int32> &out);
    static void registerKernelClass()
    {
        REGISTER_FUNCTION(FIR::filter);
        REGISTER_PARAMETER(coeffs);
    }
};
//fir.cpp
#include "fir.h"
FIR::FIR(int32(&coefficients)[NUM_COEFFS], uint32 samples)
    : coeffs(coefficients)
{
    for (int i = 0; i < NUM_COEFFS; i++)
        tapDelayLine[i] = 0;

    numSamples = samples;
}

void FIR::filter(input_buffer<int32> &in, output_buffer<int32> &out)
{
...
}

The previous example shows a slightly modified version of the FIR kernel class. Here, member variable coeffs is a int32 (&)[NUM_COEFFS] data type. The constructor initializer coeffs(coefficients) initializes coeffs to the reference to an array allocated externally to the class object. To let the AI Engine compiler know that the coeffs member variable may be relocated in the mapper stage of the compilation, you must use REGISTER_PARAMETER to register an array reference member variable inside the registerKernelClass.

The use of kernel::create_object to create a representation of a FIR kernel instance and to specify the initial value of the constructor parameters is the same as in the previous example. See the following code.

//graph.h
...
class mygraph : public graph
{
...
    mygraph()
    {
        k1 = kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
        ...
        k2 = kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
        ...
    }
};

The following code was generated by the AI Engine compiler. The memory spaces for int32 i4_coeffs[12] and int32 i5_coeffs[15] are outside the kernel object instances and are passed into the FIR objects by reference.

//Work/aie/<COL_ROW>/src/<COL_ROW>.cc
int32 i4_coeffs[12] = {180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504};
FIR i4(i4_coeffs, 8);
int32 i5_coeffs[12] = {-21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539};
FIR i5(i5_coeffs, 8);

int main(void) {
    ...
    // Kernel call : i4:filter
    i4.filter(window_buf0_buf0d_i[0],window_buf2_buf2d_o[0]);
    ...
    // Kernel call : i5:filter
    i5.filter(window_buf1_buf1d_i[0],window_buf3_buf3d_o[0]);
    ...
}

Because the memory space for an array reference member variable is allocated by the AI Engine compiler, the location constraint can be applied to constrain the memory location of these arrays, as shown in the following example code. The REGISTER_PARAMETER macro allows kernel::create_object to create a parameter handle for an array reference member variable, like k1.param[0] and k2.param[0], and the location<parameter> constraint can be applied.

//graph.h
...
class mygraph : public graph
{
...
    mygraph()
    {
        k1 = kernel::create_object<FIR>(std::vector<int>({ 180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504 }), 8);
        ...
        k2 = kernel::create_object<FIR>(std::vector<int>({ -21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539 }), 8);
        ...

        location<parameter>(k1.param[0]) = address(…);
        location<parameter>(k2.param[0]) = bank(…);
    }
};

The C++ kernel class header files and the C++ kernel function template (see C++ Template Support) should not contain single-core specific intrinsic APIs and pragmas. This is the same programming guideline as writing regular C function kernels. This is because these header files are included in the graph header file and can be cross-compiled as part of the PS program. The Arm® cross-compiler cannot understand single-core intrinsic APIs or pragmas. Single-core specific programming content must be kept inside the source files.