Accumulator Registers - 2025.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-11-26
Version
2025.2 English

The AI Engine accumulation registers are 256-bits wide and can be viewed as eight vector lanes of 32 bits each or four lanes of 64 bits each. The following table presents a comprehensive overview of the accumulator registers available in the AIE-ML and AIE-ML v2 architecture, showcasing how smaller registers are combined to form larger registers.

Table 1. Accumulator Registers
256-bit 512-bit 1024-bit
amll0 bml0 cm0
amlh0
amhl1 bmh0
amhh1
... ... ...
...
... ...
...
amll8 bml8 cm8
amlh8
amhl8 bmh8
amhh8

The 256-bit accumulator registers are denoted by the prefix "am," while two such registers can be combined to form a 512-bit register with the prefix "bm." Further, two 512-bit registers can be aliased to create a 1024-bit register, denoted by the prefix "cm."

The AI Engine-ML v2 accumulation registers are 512-bits wide and can be viewed as sixteen vector lanes of 32 bits each or eight lanes of 64 bits each. The following table shows the set of AI Engine-ML v2 accumulator registers and how smaller registers are combined to form large registers.

Table 2. AIE-ML v2 Accumulator Registers
512-bit 1024-bit 2048-bit
bmll0 cml0 dm0
bmlh0
bmhl0 cmh0
bmhh0
... ... ...
...
... ...
...
bmll7 cml7 dm7
bmlh7
bmhl7 cmh7
bmhh7

The 512-bit accumulator registers are prefixed with bm. Two of them are aliased to form a 1024-bit register prefixed with cm, and two cm can be aliased to form a 2048-bit register prefixed with dm.

Operations

The shift-round-saturate operation can be done by moving a value from an accumulator register to a vector register with any required shifting and rounding.

aie::accum<acc64,8> acc;

//shift right 10 bits, from accumulator register to vector register
aie::vector<int32,8> res=acc.to_vector<int32>(10);

The upshift operation is used to move a value from a vector register to an accumulator register.

aie::vector<int32,8> v;
aie::accum<acc64,8> acc;
acc.from_vector(v, /*shift=*/10); //shift left 10 bits, from vector register to accumulator register
aie::print(acc,/*start a new line=*/true,/*prefix*/"acc value=");

Besides from_vector() and to_vector() functions, aie::accum class has the following member functions similar to aie::vector.

insert()
Updates the contents of a region of the accumulator using the values in the given native subaccumulator and returns a reference to the updated accumulator.
grow()
Returns a copy of the current accumulator in a larger accumulator. The grow() function creates and returns a larger vector where current vector is copied to a larger vector and the other parts are undefined. The function parameter indicates the location where the current vector should be copied within the output vector.
extract()
Returns a subaccumulator with the contents of a region of the accumulator.
cast_to()
Reinterprets the current accumulator as an accumulator of the given type. The number of elements is automatically computed by the function.

alignas(aie::vector_decl_align) int32 data[8] = {1,2,3,4,5,6,7,8};
aie::vector<int32,8> v=aie::load_v<8>(data);
aie::accum<acc64,8> acc; 
acc.from_vector(v, /*shift=*/0); //shift left 0 bits

aie::accum<acc64,16> acc2=acc.grow<16>();
aie::print(acc2,/*start a new line=*/true,/*prefix*/"acc2 value=");
//Output: acc2 value=0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000

acc2.insert(1,acc);
aie::print(acc2,true,"acc2 value=");
//Output: acc2 value=0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008 0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008

aie::accum<cacc48,4> cacc1=acc2.extract<8>(0).cast_to<cacc48>();//extract lower part, and cast to cacc48
aie::print(cacc1,true,"cacc1 value=");
//Output: cacc1 value=(0x000000000001, 0x000000000002) (0x000000000003, 0x000000000004) (0x000000000005, 0x000000000006) (0x000000000007, 0x000000000008)