Linear Approximation - 2024.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-11-28
Version
2024.2 English

aie::linear_approx supports linear approximation by parallel fetching data from aie::lut and estimates output with the compute method.

The slope and offset values are stored in memory aie::lut this way:
constexpr unsigned size = 8;
const int16 lut_ab[size*2*2] = {
  slope0, offset0, slope1, offset1, slope2, offset2, slope3, offset3, 
  slope0, offset0, slope1, offset1, slope2, offset2, slope3, offset3, //note 128b duplication     
  slope4, offset4, slope5, offset5, slope6, offset6, slope7, offset7, 
  slope4, offset4, slope5, offset5, slope6, offset6, slope7, offset7
};
const int16 lut_cd[size*2*2] = {
  slope0, offset0, slope1, offset1, slope2, offset2, slope3, offset3, 
  slope0, offset0, slope1, offset1, slope2, offset2, slope3, offset3, 
  slope4, offset4, slope5, offset5, slope6, offset6, slope7, offset7, 
  slope4, offset4, slope5, offset5, slope6, offset6, slope7, offset7
};
aie::lut<4, int16, int16> lookup_table(size, lut_ab, lut_cd);
aie::linear_approx has following parameters besides the data aie::lut:
  • step_bits: Can be zero or larger depending on data types. If it is larger than zero, the lower step_bits bits of input are used with slope to do estimation. The higher part is used with bias to do indexing.
  • bias: It's added to the higher part of input to do indexing.
  • shift_offset: Optional scaling factor applied to the offset.
The logical steps of the computation for an integer based linear approximation are:
  • index= input>>step_bits + bias
  • Slope/offset pair read from LUT based on index
  • output = slope * input[step_bits-1:0] + (offset << shift_offset)
The steps for a floating point based linear approximation are:
  • index = (int(floor(input)) >> step_bits) + bias
  • slope/offset pair read from LUT based on index
  • output = slope * input + offset
The following picture shows how linear approximation is done with aie::linear_approx:
Figure 1. Linear Approximation with aie::linear_approx

An example kernel code:

const int size=1024;
int16 lnr_lutab[size*2*2]={
  #include "data/LUT_SLOPE.h"
};
int16 lnr_lutcd[size*2*2]={
  #include "data/LUT_SLOPE.h"
};

__attribute__((noinline)) void linear_approx(input_buffer<int16>& __restrict index, output_buffer<int16>& __restrict out){ 

  const aie::lut<4, int16> my_lut(size,lnr_lutab,lnr_lutcd);

  //calling linear_approx with my_lut, step_bits=3, bias=0, shift_offset=0
  aie::linear_approx<int16, aie::lut<4, int16, int16>> linear_ap(my_lut, 3, 0, 0);

  auto it=aie::begin_vector<16>(index);
  auto ot=aie::begin_vector<16>(out);
  for(int i=0;i<size/16;i++){ 
    aie::vector<int16,16> vin=*it++;
    *ot++ = linear_ap.compute(vin).to_vector<int16>(0);
  }
}

For the data types supported, and step_bits requirements, see aie::linear_approx in the AI Engine API User Guide (UG1529).

To achieve full parallelism, the LUTs must be placed in different banks by constraining the LUTs in the graph. For details, see Global Graph-Scoped Tables.