When TP_USE_LUT_RELOAD = 1, lookup tables can be modified at runtime via RTP ports instead of being fixed at compile time.
RTP Port Configuration:
- Two RTP ports are created for parallel memory access
- Both ports must be updated with identical lookup table data
Runtime Update Methods:
There are two methods for update of LUT values at runtime:
- Use the provided graph
update_rtpfunction. This will perform all required duplication and broadcast. - Drive data to the graph RTP ports directly. In this case, data must be duplicated depending on device as described below, and must be broadcast to both ports.
Graph Instantiation:
// Static LUT (TP_USE_LUT_RELOAD = 0) std::array<TT_LUT, kLutValues> staticLUT = { /* values */ }; func_approx_graph<TT_DATA, TP_COARSE_BITS, ...> graph(staticLUT); // Runtime LUT (TP_USE_LUT_RELOAD = 1) func_approx_graph<TT_DATA, TP_COARSE_BITS, ..., 1> graph(); // No LUT argument
Runtime Updates:
std::array<TT_LUT, kLutValues> newLUT = { /* new values */ }; graph.update_rtp(topGraph, newLUT, graph.rtpLut);
Memory Duplication Requirements:
For AIE-ML and AIE-MLv2 devices with int16 or bfloat16 data types, hardware parallel access requires memory duplication:
// Memory layout for AIE-ML (128-bit alignment) // Each group of 8 entries (slope-offset pairs) is duplicated lut_data = {s0,o0, s1,o1, s2,o2, s3,o3, // First 128-bit block s0,o0, s1,o1, s2,o2, s3,o3, // Duplicated block s4,o4, s5,o5, s6,o6, s7,o7, // Second 128-bit block s4,o4, s5,o5, s6,o6, s7,o7, // Duplicated block ...} // Memory layout for AIE-MLv2 (256-bit alignment) // Each group of 16 entries is duplicated lut_data = {s0,o0, s1,o1, ..., s7,o7, // First 256-bit block s0,o0, s1,o1, ..., s7,o7, // Duplicated block s8,o8, s9,o9, ..., s15,o15, // Second 256-bit block s8,o8, s9,o9, ..., s15,o15, // Duplicated block ...}
Automatic Duplication: The update_rtp method handles all duplication automatically.