Lookup Tables and Linear Approximation - 2025.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-11-26
Version
2025.2 English
The AI Engine API provides high level abstractions of parallel access supported by AIE-ML and AIE-ML v2 architectures:
  • aie::parallel_lookup
  • aie::linear_approx
These operations require data to be in a specific layout with the type aie::lut. For AIE-ML, data in each LUT has 128 bits repetition in memory, and the LUT has two copies. In total, the same values need to be present four times in memory to allow for the four parallel accesses of aie::lut. For example:
constexpr unsigned size = 8;
const int32 lut_ab[size*2] = {
  value0, value1, value2, value3, 
  value0, value1, value2, value3, //note 128b duplication 
  value4, value5, value6, value7, 
  value4, value5, value6, value7
};
const int32 lut_cd[size*2] = {
  value0, value1, value2, value3, 
  value0, value1, value2, value3, 
  value4, value5, value6, value7, 
  value4, value5, value6, value7
};
aie::lut<4, int32> lookup_table(size, lut_ab, lut_cd);//ParallelAccesses=4
For AIE-ML v2, data in each LUT has 256 bits repetition in memory and the LUT has two copies. In total, the same values need to be present eight times in memory to allow for the eight parallel accesses of aie::lut. For example:
constexpr unsigned size = 8;
const int32 lut_ab[size*2] = {
  value0, value1, value2, value3,
  value4, value5, value6, value7,  // note 256b duplication
  value0, value1, value2, value3
  value4, value5, value6, value7
};
const int32 lut_cd[size*2] = {
  value0, value1, value2, value3,
  value4, value5, value6, value7,
  value0, value1, value2, value3
  value4, value5, value6, value7
};
aie::lut<8, int32> lookup_table(size, lut_ab, lut_cd);//ParallelAccesses=4