The AI Engine API
provides high level abstractions of parallel access supported by AIE-ML and AIE-ML v2 architectures:
-
aie::parallel_lookup -
aie::linear_approx
These operations require data to be in a specific layout with the type
aie::lut. For AIE-ML, data in each LUT
has 128 bits repetition in memory, and the LUT has two copies. In total, the same
values need to be present four times in memory to allow for the four parallel
accesses of aie::lut. For example:
constexpr unsigned size = 8;
const int32 lut_ab[size*2] = {
value0, value1, value2, value3,
value0, value1, value2, value3, //note 128b duplication
value4, value5, value6, value7,
value4, value5, value6, value7
};
const int32 lut_cd[size*2] = {
value0, value1, value2, value3,
value0, value1, value2, value3,
value4, value5, value6, value7,
value4, value5, value6, value7
};
aie::lut<4, int32> lookup_table(size, lut_ab, lut_cd);//ParallelAccesses=4
For AIE-ML v2, data in each LUT has 256 bits
repetition in memory and the LUT has two copies. In total, the same values need to
be present eight times in memory to allow for the eight parallel accesses of
aie::lut. For example:
constexpr unsigned size = 8;
const int32 lut_ab[size*2] = {
value0, value1, value2, value3,
value4, value5, value6, value7, // note 256b duplication
value0, value1, value2, value3
value4, value5, value6, value7
};
const int32 lut_cd[size*2] = {
value0, value1, value2, value3,
value4, value5, value6, value7,
value0, value1, value2, value3
value4, value5, value6, value7
};
aie::lut<8, int32> lookup_table(size, lut_ab, lut_cd);//ParallelAccesses=4