aie::linear_approx
supports linear
approximation by parallel fetching data from aie::lut
and estimates output with the compute
method.
The slope and offset values are stored in memory
aie::lut
this
way:constexpr unsigned size = 8;
const int16 lut_ab[size*2*2] = {
slope0, offset0, slope1, offset1, slope2, offset2, slope3, offset3,
slope0, offset0, slope1, offset1, slope2, offset2, slope3, offset3, //note 128b duplication
slope4, offset4, slope5, offset5, slope6, offset6, slope7, offset7,
slope4, offset4, slope5, offset5, slope6, offset6, slope7, offset7
};
const int16 lut_cd[size*2*2] = {
slope0, offset0, slope1, offset1, slope2, offset2, slope3, offset3,
slope0, offset0, slope1, offset1, slope2, offset2, slope3, offset3,
slope4, offset4, slope5, offset5, slope6, offset6, slope7, offset7,
slope4, offset4, slope5, offset5, slope6, offset6, slope7, offset7
};
aie::lut<4, int16, int16> lookup_table(size, lut_ab, lut_cd);
aie::linear_approx
has following
parameters besides the data aie::lut
:-
step_bits
: Can be zero or larger depending on data types. If it is larger than zero, the lowerstep_bits
bits of input are used with slope to do estimation. The higher part is used withbias
to do indexing. -
bias
: It's added to the higher part of input to do indexing. -
shift_offset
: Optional scaling factor applied to the offset.
-
index= input>>step_bits + bias
-
Slope/offset pair read from LUT based on index
-
output = slope * input[step_bits-1:0] + (offset << shift_offset)
The steps for a floating point based linear approximation are:
-
index = (int(floor(input)) >> step_bits) + bias
-
slope/offset pair read from LUT based on index
-
output = slope * input + offset
aie::linear_approx
:Figure 1. Linear Approximation with
aie::linear_approx
An example kernel code:
const int size=1024;
int16 lnr_lutab[size*2*2]={
#include "data/LUT_SLOPE.h"
};
int16 lnr_lutcd[size*2*2]={
#include "data/LUT_SLOPE.h"
};
__attribute__((noinline)) void linear_approx(input_buffer<int16>& __restrict index, output_buffer<int16>& __restrict out){
const aie::lut<4, int16> my_lut(size,lnr_lutab,lnr_lutcd);
//calling linear_approx with my_lut, step_bits=3, bias=0, shift_offset=0
aie::linear_approx<int16, aie::lut<4, int16, int16>> linear_ap(my_lut, 3, 0, 0);
auto it=aie::begin_vector<16>(index);
auto ot=aie::begin_vector<16>(out);
for(int i=0;i<size/16;i++){
aie::vector<int16,16> vin=*it++;
*ot++ = linear_ap.compute(vin).to_vector<int16>(0);
}
}
For the data types supported, and step_bits
requirements, see aie::linear_approx in the
AI
Engine API User Guide (UG1529).
To achieve full parallelism, the LUTs must be placed in different banks by constraining the LUTs in the graph. For details, see Global Graph-Scoped Tables.