AI Engine is able to perform several vector load or store operations per cycle. However, for the load or store operations to be executed in parallel, they must target different memory banks. In general, the compiler tries to schedule many memory accesses in the same cycle when possible but there are some exceptions. Memory accesses coming from the same pointer are scheduled on different cycles. If the compiler schedules the operations on multiple variables or pointers in the same cycle, memory bank conflicts can occur.
To avoid concurrent access to a memory with multiple variables or
pointers, most memory access functions in the AI Engine API accept an enum value from aie_dm_resource that can be used to bind individual
accesses to a virtual resource as shown in the following example.
enum class aie_dm_resource {
none,
a,
b,
c,
d,
stack
};
The following example shows how to annotate memory access to allow or avoid accessing memories at the same cycle.
int __aie_dm_resource_a *A;
int *B;
aie::vector<int,8> v1 = aie::load_v<8>(A);
/* Following access can be scheduled on the same cycle as the access to A since B is not annotated. */
aie::vector<int,8> v2 = aie::load_v<8>(B);
/* Following specific access to B is annotated with the same virtual resource as A, so they cannot be scheduled on the same cycle. */
aie::vector<int,8> v3 = aie::load_v<8, aie_dm_resource::a>(B);
/* vector iterator of B, annotated with the same virtual resource as A, so they cannot be scheduled on the same cycle. */
auto it = aie::begin_vector<8, aie_dm_resource::a>(B);
aie::vector<int,8> v4 = *(++it);
Also, the compiler provides the following aie_dm_resource annotations to annotate different virtual resources.
Accesses using types that are associated with the same virtual resource are not
scheduled to access the resource at the same cycle.
__aie_dm_resource_a
__aie_dm_resource_b
__aie_dm_resource_c
__aie_dm_resource_d
__aie_dm_resource_stack
As well as these virtual resource annotations, AI Engine-ML v2 allows also virtual resource combining:
__aie_dm_resource_ab
__aie_dm_resource_ac
__aie_dm_resource_ad
__aie_dm_resource_bc
__aie_dm_resource_bd
__aie_dm_resource_cd
For example, the following code annotates two arrays to the same
__aie_dm_resource_a. This guides the compiler
to not access the arrays in the same cycle. It shows two ways to load vectors: one
is using aie::load_v, and alternatively using
iterators.
aie::vector<int32,8> va[32];
aie::vector<int32,8> vb[32];
//annotate array va and array vb to the same __aie_dm_resource_a
int32 __aie_dm_resource_a* __restrict p_va = (int32 __aie_dm_resource_a*)va;
int32 __aie_dm_resource_a* __restrict p_vb = (int32 __aie_dm_resource_a*)vb;
//declare iterator on array vb
auto it_b=aie::begin_vector<8>(p_vb);
//access va via pointer p_va and vb via iterator it_b
aie::vector<int32,8> vc;
vc=aie::load_v<8>(p_va)+*it_b;
//increment pointer to va and iterator to vb
p_va+=8;
++it_b;
Avoid adding resource annotation in kernel function signature. Following code gives an example to declare pointers with resource annotations:
void kernel_top(input_buffer<int32> & __restrict data1, input_buffer<int32>& __restrict data2, ...){
int32 __aie_dm_resource_a* __restrict w_data1 = (int32 __aie_dm_resource_a* __restrict)data1.data();
int32 __aie_dm_resource_b* __restrict w_data2 = (int32 __aie_dm_resource_b* __restrict)data2.data();
auto pv=aie::begin_vector<8>(w_data1);
auto pv2=aie::begin_vector<8>(w_data2);
auto va=*pv++;
auto vb=*pv2++;
...
}
The following code is to annotate an array and a buffer to the same
__aie_dm_resource_a that guides the compiler
to not access them in the same cycle.
alignas(aie::vector_decl_align) static int32 coeff[256]={...};
void func(input_buffer<int32> & __restrict wa, ......){
aie::vector<int32,8> v_coeff=aie::load_v<8>((int32 __aie_dm_resource_a *)coeff);
int32 __aie_dm_resource_a* __restrict p_wa = (int32 __aie_dm_resource_a*)wa.data();
auto waIter=aie::begin_vector<8>(p_wa);
aie::vector<int32,8> va;
va=*waIter;
......
}
Combining Virtual Resources
Virtual resources can be combined, which allows for more advanced pointers relationships, for efficient memory access. Because buffers can span multiple memory banks, awareness of potential access conflicts and the need for separate clock cycles, for optimal performance is necessary.
In this scenario, a buffer annotated with __aie_dm_resource_a can
be aliased with:
- Non-annotated buffers
- Buffers annotated with
__aie_dm_resource_ab - Buffers annotated with
__aie_dm_resource_ac - Buffers annotated with
__aie_dm_resource_ad
This flexibility allows for efficient management of memory resources, particularly when dealing with buffers spanning multiple memory banks.
Example: Efficient Access of Buffers Spanning Multiple Memory Banks
Consider a set of three buffers:
- Buffer 1
- Located on memory bank 0, annotated with
__aie_dm_resource_a - Buffer 2
- Spans memory banks 0 and 1, annotated with
__aie_dm_resource_ab - Buffer 3
- Located on memory bank 1, annotated with
__aie_dm_resource_b
This configuration allows for:
- Simultaneous access of Buffers 1 and 3 within the same clock cycle due to their shared memory bank.
- Aliasing of Buffer 2 with both Buffers 1 and 3, enabling access through either alias.
- Independent access of Buffer 2 in a separate clock cycle to avoid memory conflicts, even though it aliases with the other buffers.
This approach offers a way to optimize memory access and performance while maintaining flexibility in pointer relationships.