Load and Store with Virtual Resource Annotations - 2025.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-11-26
Version
2025.2 English

AI Engine is able to perform several vector load or store operations per cycle. However, for the load or store operations to be executed in parallel, they must target different memory banks. In general, the compiler tries to schedule many memory accesses in the same cycle when possible but there are some exceptions. Memory accesses coming from the same pointer are scheduled on different cycles. If the compiler schedules the operations on multiple variables or pointers in the same cycle, memory bank conflicts can occur.

Note: Location constraints might be required to ensure that variables are placed in the expected memory bank.

To avoid concurrent access to a memory with multiple variables or pointers, most memory access functions in the AI Engine API accept an enum value from aie_dm_resource that can be used to bind individual accesses to a virtual resource as shown in the following example.

enum class aie_dm_resource {
  none,
  a,
  b,
  c,
  d,
  stack
};

The following example shows how to annotate memory access to allow or avoid accessing memories at the same cycle.

int __aie_dm_resource_a *A;
int *B;
aie::vector<int,8> v1 = aie::load_v<8>(A);

/* Following access can be scheduled on the same cycle as the access to A since B is not annotated. */
aie::vector<int,8> v2 = aie::load_v<8>(B); 

/* Following specific access to B is annotated with the same virtual resource as A, so they cannot be scheduled on the same cycle. */
aie::vector<int,8> v3 = aie::load_v<8, aie_dm_resource::a>(B); 

/* vector iterator of B, annotated with the same virtual resource as A, so they cannot be scheduled on the same cycle. */
auto it = aie::begin_vector<8, aie_dm_resource::a>(B); 
aie::vector<int,8> v4 = *(++it);

Also, the compiler provides the following aie_dm_resource annotations to annotate different virtual resources. Accesses using types that are associated with the same virtual resource are not scheduled to access the resource at the same cycle.

__aie_dm_resource_a
__aie_dm_resource_b
__aie_dm_resource_c
__aie_dm_resource_d
__aie_dm_resource_stack

As well as these virtual resource annotations, AI Engine-ML v2 allows also virtual resource combining:

__aie_dm_resource_ab
__aie_dm_resource_ac
__aie_dm_resource_ad
__aie_dm_resource_bc
__aie_dm_resource_bd
__aie_dm_resource_cd

For example, the following code annotates two arrays to the same __aie_dm_resource_a. This guides the compiler to not access the arrays in the same cycle. It shows two ways to load vectors: one is using aie::load_v, and alternatively using iterators.

aie::vector<int32,8> va[32];
aie::vector<int32,8> vb[32];

//annotate array va and array vb to the same __aie_dm_resource_a
int32 __aie_dm_resource_a* __restrict p_va = (int32 __aie_dm_resource_a*)va;
int32 __aie_dm_resource_a* __restrict p_vb = (int32 __aie_dm_resource_a*)vb;

//declare iterator on array vb
auto it_b=aie::begin_vector<8>(p_vb);

//access va via pointer p_va and vb via iterator it_b
aie::vector<int32,8> vc;
vc=aie::load_v<8>(p_va)+*it_b;

//increment pointer to va and iterator to vb
p_va+=8;
++it_b;

Avoid adding resource annotation in kernel function signature. Following code gives an example to declare pointers with resource annotations:

void kernel_top(input_buffer<int32> & __restrict data1, input_buffer<int32>& __restrict data2, ...){
  int32 __aie_dm_resource_a* __restrict w_data1 = (int32 __aie_dm_resource_a* __restrict)data1.data();
  int32 __aie_dm_resource_b* __restrict w_data2 = (int32 __aie_dm_resource_b* __restrict)data2.data();

  auto pv=aie::begin_vector<8>(w_data1);
  auto pv2=aie::begin_vector<8>(w_data2);
  auto va=*pv++;
  auto vb=*pv2++;
  ...
}

The following code is to annotate an array and a buffer to the same __aie_dm_resource_a that guides the compiler to not access them in the same cycle.

alignas(aie::vector_decl_align) static int32 coeff[256]={...};

void func(input_buffer<int32> & __restrict wa, ......){
  aie::vector<int32,8> v_coeff=aie::load_v<8>((int32 __aie_dm_resource_a *)coeff);
  int32 __aie_dm_resource_a* __restrict p_wa = (int32 __aie_dm_resource_a*)wa.data();

  auto waIter=aie::begin_vector<8>(p_wa);
  aie::vector<int32,8> va;
  va=*waIter;
  ......
}

Combining Virtual Resources

Virtual resources can be combined, which allows for more advanced pointers relationships, for efficient memory access. Because buffers can span multiple memory banks, awareness of potential access conflicts and the need for separate clock cycles, for optimal performance is necessary.

In this scenario, a buffer annotated with __aie_dm_resource_a can be aliased with:

  • Non-annotated buffers
  • Buffers annotated with __aie_dm_resource_ab
  • Buffers annotated with __aie_dm_resource_ac
  • Buffers annotated with __aie_dm_resource_ad

This flexibility allows for efficient management of memory resources, particularly when dealing with buffers spanning multiple memory banks.

Example: Efficient Access of Buffers Spanning Multiple Memory Banks

Consider a set of three buffers:

Buffer 1
Located on memory bank 0, annotated with __aie_dm_resource_a
Buffer 2
Spans memory banks 0 and 1, annotated with __aie_dm_resource_ab
Buffer 3
Located on memory bank 1, annotated with __aie_dm_resource_b

This configuration allows for:

  • Simultaneous access of Buffers 1 and 3 within the same clock cycle due to their shared memory bank.
  • Aliasing of Buffer 2 with both Buffers 1 and 3, enabling access through either alias.
  • Independent access of Buffer 2 in a separate clock cycle to avoid memory conflicts, even though it aliases with the other buffers.

This approach offers a way to optimize memory access and performance while maintaining flexibility in pointer relationships.