Accumulator Registers - 2024.2 English

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-11-28
Version
2024.2 English

The AIE-ML accumulation registers are 256-bits wide and can be viewed as eight vector lanes of 32 bits each or four lanes of 64 bits each. The following table shows the set of AIE-ML accumulator registers and how smaller registers are combined to form large registers.

Table 1. Accumulator Registers
256-bit 512-bit 1024-bit
amll0 bml0 cm0
amlh0
amhl1 bmh0
amhh1
... ... ...
...
... ...
...
amll8 bml8 cm8
amlh8
amhl8 bmh8
amhh8

The 256-bit accumulator registers are prefixed with the letters am. Two of them are aliased to form a 512-bit register that is prefixed with bm. Two bm can be aliased to form a 1024-bit register prefixed with cm.

Operations

The shift-round-saturate operation can be done by moving a value from an accumulator register to a vector register with any required shifting and rounding.

aie::accum<acc64,8> acc;

//shift right 10 bits, from accumulator register to vector register
aie::vector<int32,8> res=acc.to_vector<int32>(10);

The upshift operation is used to move a value from a vector register to an accumulator register.

aie::vector<int32,8> v;
aie::accum<acc64,8> acc;
acc.from_vector(v, /*shift=*/10); //shift left 10 bits, from vector register to accumulator register
aie::print(acc,/*start a new line=*/true,/*prefix*/"acc value=");

Besides from_vector() and to_vector() functions, aie::accum class has the following member functions similar to aie::vector.

insert()
Updates the contents of a region of the accumulator using the values in the given native subaccumulator and returns a reference to the updated accumulator.
grow()
Returns a copy of the current accumulator in a larger accumulator. The grow() function creates and returns a larger vector where current vector is copied to a larger vector and the other parts are undefined. The function parameter indicates the location where the current vector should be copied within the output vector.
extract()
Returns a subaccumulator with the contents of a region of the accumulator.
cast_to()
Reinterprets the current accumulator as an accumulator of the given type. The number of elements is automatically computed by the function.
int32 data[8]={1,2,3,4,5,6,7,8};
aie::vector<int32,8> v=aie::load_v<8>(data);
aie::accum<acc64,8> acc; 
acc.from_vector(v, /*shift=*/0); //shift left 0 bits

aie::accum<acc64,16> acc2=acc.grow<16>();
aie::print(acc2,/*start a new line=*/true,/*prefix*/"acc2 value=");
//Output: acc2 value=0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000

acc2.insert(1,acc);
aie::print(acc2,true,"acc2 value=");
//Output: acc2 value=0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008 0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008

aie::accum<cacc48,4> cacc1=acc2.extract<8>(0).cast_to<cacc48>();//extract lower part, and cast to cacc48
aie::print(cacc1,true,"cacc1 value=");
//Output: cacc1 value=(0x000000000001, 0x000000000002) (0x000000000003, 0x000000000004) (0x000000000005, 0x000000000006) (0x000000000007, 0x000000000008)