AI Engine-ML is able to perform several vector load or store operations per cycle. However, for the load or store operations to be executed in parallel, they must target different memory banks. In general, the compiler tries to schedule many memory accesses in the same cycle when possible but there are some exceptions. Memory accesses coming from the same pointer are scheduled on different cycles. If the compiler schedules the operations on multiple variables or pointers in the same cycle, memory bank conflicts can occur.
To avoid concurrent access to a memory with multiple variables or
pointers, most memory access functions in the AI Engine API accept an enum value from aie_dm_resource
that can be used to bind individual
accesses to a virtual resource as shown in the following example.
enum class aie_dm_resource {
none,
a,
b,
c,
d,
stack
};
The following example shows how to annotate memory access to allow or avoid accessing memories at the same cycle.
int __aie_dm_resource_a *A;
int *B;
aie::vector<int,8> v1 = aie::load_v<8>(A);
/* Following access can be scheduled on the same cycle as the access to A since B is not annotated. */
aie::vector<int,8> v2 = aie::load_v<8>(B);
/* Following specific access to B is annotated with the same virtual resource as A, so they cannot be scheduled on the same cycle. */
aie::vector<int,8> v3 = aie::load_v<8, aie_dm_resource::a>(B);
/* vector iterator of B, annotated with the same virtual resource as A, so they cannot be scheduled on the same cycle. */
auto it = aie::begin_vector<8, aie_dm_resource::a>(B);
aie::vector<int,8> v4 = *(++it);
Also, the compiler provides the following aie_dm_resource
annotations to annotate different virtual resources.
Accesses using types that are associated with the same virtual resource are not
scheduled to access the resource at the same cycle.
__aie_dm_resource_a
__aie_dm_resource_b
__aie_dm_resource_c
__aie_dm_resource_d
__aie_dm_resource_stack
For example, the following code annotates two arrays to the same
__aie_dm_resource_a
. This guides the compiler
to not access the arrays in the same cycle. It shows two ways to load vectors: one
is using aie::load_v
, and alternatively using
iterators.
aie::vector<int32,8> va[32];
aie::vector<int32,8> vb[32];
//annotate array va and array vb to the same __aie_dm_resource_a
int32 __aie_dm_resource_a* __restrict p_va = (int32 __aie_dm_resource_a*)va;
int32 __aie_dm_resource_a* __restrict p_vb = (int32 __aie_dm_resource_a*)vb;
//declare iterator on array vb
auto it_b=aie::begin_vector<8>(p_vb);
//access va via pointer p_va and vb via iterator it_b
aie::vector<int32,8> vc;
vc=aie::load_v<8>(p_va)+*it_b;
//increment pointer to va and iterator to vb
p_va+=8;
++it_b;
void kernel_top(input_buffer<int32> & __restrict data1, input_buffer<int32>& __restrict data2, ...){
int32 __aie_dm_resource_a* __restrict w_data1 = (int32 __aie_dm_resource_a* __restrict)data1.data();
int32 __aie_dm_resource_b* __restrict w_data2 = (int32 __aie_dm_resource_b* __restrict)data2.data();
auto pv=aie::begin_vector<8>(w_data1);
auto pv2=aie::begin_vector<8>(w_data2);
auto va=*pv++;
auto vb=*pv2++;
...
}
The following code is to annotate an array and a buffer to the same
__aie_dm_resource_a
that guides the compiler
to not access them in the same cycle.alignas(aie::vector_decl_align) static int32 coeff[256]={...};
void func(input_buffer<int32> & __restrict wa, ......){
aie::vector<int32,8> v_coeff=aie::load_v<8>((int32 __aie_dm_resource_a *)coeff);
int32 __aie_dm_resource_a* __restrict p_wa = (int32 __aie_dm_resource_a*)wa.data();
auto waIter=aie::begin_vector<8>(p_wa);
aie::vector<int32,8> va;
va=*waIter;
......
}