Load and Store with Virtual Resource Annotations - 2024.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-11-28
Version
2024.2 English

AI Engine-ML is able to perform several vector load or store operations per cycle. However, for the load or store operations to be executed in parallel, they must target different memory banks. In general, the compiler tries to schedule many memory accesses in the same cycle when possible but there are some exceptions. Memory accesses coming from the same pointer are scheduled on different cycles. If the compiler schedules the operations on multiple variables or pointers in the same cycle, memory bank conflicts can occur.

Note: Location constraints may be required to ensure that variables are placed in the expected memory bank.

To avoid concurrent access to a memory with multiple variables or pointers, most memory access functions in the AI Engine API accept an enum value from aie_dm_resource that can be used to bind individual accesses to a virtual resource as shown in the following example.

enum class aie_dm_resource {
  none,
  a,
  b,
  c,
  d,
  stack
};

The following example shows how to annotate memory access to allow or avoid accessing memories at the same cycle.

int __aie_dm_resource_a *A;
int *B;
aie::vector<int,8> v1 = aie::load_v<8>(A);

/* Following access can be scheduled on the same cycle as the access to A since B is not annotated. */
aie::vector<int,8> v2 = aie::load_v<8>(B); 

/* Following specific access to B is annotated with the same virtual resource as A, so they cannot be scheduled on the same cycle. */
aie::vector<int,8> v3 = aie::load_v<8, aie_dm_resource::a>(B); 

/* vector iterator of B, annotated with the same virtual resource as A, so they cannot be scheduled on the same cycle. */
auto it = aie::begin_vector<8, aie_dm_resource::a>(B); 
aie::vector<int,8> v4 = *(++it);

Also, the compiler provides the following aie_dm_resource annotations to annotate different virtual resources. Accesses using types that are associated with the same virtual resource are not scheduled to access the resource at the same cycle.

__aie_dm_resource_a
__aie_dm_resource_b
__aie_dm_resource_c
__aie_dm_resource_d
__aie_dm_resource_stack

For example, the following code annotates two arrays to the same __aie_dm_resource_a. This guides the compiler to not access the arrays in the same cycle. It shows two ways to load vectors: one is using aie::load_v, and alternatively using iterators.

aie::vector<int32,8> va[32];
aie::vector<int32,8> vb[32];

//annotate array va and array vb to the same __aie_dm_resource_a
int32 __aie_dm_resource_a* __restrict p_va = (int32 __aie_dm_resource_a*)va;
int32 __aie_dm_resource_a* __restrict p_vb = (int32 __aie_dm_resource_a*)vb;

//declare iterator on array vb
auto it_b=aie::begin_vector<8>(p_vb);

//access va via pointer p_va and vb via iterator it_b
aie::vector<int32,8> vc;
vc=aie::load_v<8>(p_va)+*it_b;

//increment pointer to va and iterator to vb
p_va+=8;
++it_b;
Avoid adding resource annotation in kernel function signature. Following code gives an example to declare pointers with resource annotations:
void kernel_top(input_buffer<int32> & __restrict data1, input_buffer<int32>& __restrict data2, ...){
  int32 __aie_dm_resource_a* __restrict w_data1 = (int32 __aie_dm_resource_a* __restrict)data1.data();
  int32 __aie_dm_resource_b* __restrict w_data2 = (int32 __aie_dm_resource_b* __restrict)data2.data();

  auto pv=aie::begin_vector<8>(w_data1);
  auto pv2=aie::begin_vector<8>(w_data2);
  auto va=*pv++;
  auto vb=*pv2++;
  ...
}
The following code is to annotate an array and a buffer to the same __aie_dm_resource_a that guides the compiler to not access them in the same cycle.
alignas(aie::vector_decl_align) static int32 coeff[256]={...};

void func(input_buffer<int32> & __restrict wa, ......){
  aie::vector<int32,8> v_coeff=aie::load_v<8>((int32 __aie_dm_resource_a *)coeff);
  int32 __aie_dm_resource_a* __restrict p_wa = (int32 __aie_dm_resource_a*)wa.data();

  auto waIter=aie::begin_vector<8>(p_wa);
  aie::vector<int32,8> va;
  va=*waIter;
  ......
}