The AI Engine API provides high level
abstractions of parallel access supported by AI Engine-ML architectures:
-
aie::parallel_lookup -
aie::linear_approx
These operations require data to be in a specific layout with the type
aie::lut. Data in each LUT has 128 bits repetition in
memory, and the LUT has two copies. In total, the same values need to be present
four times in memory to allow for the four parallel accesses of aie::lut. For example:
constexpr unsigned size = 8;
const int32 lut_ab[size*2] = {
value0, value1, value2, value3,
value0, value1, value2, value3, //note 128b duplication
value4, value5, value6, value7,
value4, value5, value6, value7
};
const int32 lut_cd[size*2] = {
value0, value1, value2, value3,
value0, value1, value2, value3,
value4, value5, value6, value7,
value4, value5, value6, value7
};
aie::lut<4, int32> lookup_table(size, lut_ab, lut_cd);//ParallelAccesses=4