aie::parallel_lookup
supports parallel fetches
of data from aie::lut
by fetch
method. For the data types of indexes and values supported, see
the aie::parallel_lookup section in the
AI
Engine API User Guide (UG1529).
An example kernel
code:
const int size=1024;
alignas(aie::vector_decl_align) int16 lutab[size*2]={
#include "data/LUT.h"
};
alignas(aie::vector_decl_align) int16 lutcd[size*2]={
#include "data/LUT.h"
};
__attribute__((noinline)) void parallel_lookup(input_buffer<uint8>&
__restrict index, output_buffer<int16>& __restrict out){
const aie::lut<4, int16> my_lut(size,lutab,lutcd);
aie::parallel_lookup<uint8, aie::lut<4, int16>> lookup(my_lut, 0);
auto it=aie::begin_vector<32>(index);
auto ot=aie::begin_vector<32>(out);
for(int i=0;i<size/32;i++){
aie::vector<uint8,32> vin=*it++;
*ot++ = lookup.fetch(vin);
}
}
To achieve full parallelism, the LUTs must be placed in different banks. To do so, constrain the LUTs in the graph. For details, see Global Graph-Scoped Tables.