The lookup table contains 2^TP_COARSE_BITS locations, where each location stores a slope and offset value representing the linear approximation for the corresponding domain segment. For integer types, there will be 2^TP_FINE_BITS interpolation points between each entry of the lookup table. For example, if TP_COARSE_BITS = 4 and TP_FINE_BITS = 4, there will be 16 locations in the lookup table, with 16 interpolation points between each location.
LUT Data Types and Scaling:
Slope and offset values use the same data type as TT_DATA, except when TT_DATA is bfloat16, where values are stored as float for precision. This storage type is called TT_LUT in the API reference.
For integer types, values are scaled to match the bit allocation of TP_COARSE_BITS and TP_FINE_BITS, with precision limited by the TT_LUT data type width. See func_approx_fns in API Reference Overview for scaling and biasing for specific domain modes.
Memory Requirements:
Each LUT location stores one slope-offset pair, requiring 2 * sizeof(TT_LUT) bytes. The graph creates internal duplicates for performance, with AIE-ML/AIE-MLv2 requiring additional duplication for parallel access when using int16 or bfloat16 data types.
LUT Generation:
Utility functions are provided to create LUTs for common functions (see API Reference Overview):
getSqrt, getInvSqrt, getLog, getExp, getInv. These functions handle scaling automatically and populate LUTs in the required slope-offset format.
For integer data types (point-slope form):
output = offset[index] + slope[index] * input[fine_bits:0]
For floating-point data types (slope-intercept form):
output = offset[index] + slope[index] * input
For example, slope-offset values for integer types (point-slope):
slope[i] = y[i+1] - y[i] offset[i] = y[i]
Slope-offset values for floating-point types (slope-intercept):
slope[i] = (y[i + 1] - y[i]) / (x[i + 1] - x[i]) offset[i] = y[i] - slope[i] * x[i]
The provided lookup table should be populated as below:
slope[0], offset[0], slope[1], offset[1], ... slope[2^TP_COARSE_BITS - 1], offset[2^TP_COARSE_BITS - 1]
A single lookup will require sizeof(TT_DATA) * 2 * 2^TP_COARSE_BITS bytes of memory. For performance reasons, a duplicate of the lookup is created by the func_approx graph. Configurations for AIE-ML or AIE-MLv2 devices with a data type of int16 or bfloat16 will use the AI Engine API for improved parallel lookups. However, this requires an additional duplication within each lookup table. This duplication will be done within the graph but must be accounted for when calculating the memory required for the provided lookup tables. Users must provide the lookup table, without any duplication, as a constructor argument to the graph.