Accumulator Registers - 2024.1 English

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-06-06
Version
2024.1 English

The accumulation registers are 256 bits wide and can be viewed as eight vector lanes of 32 bits each or four lanes of 64 bits each. The following table shows the set of accumulator registers and how smaller registers are combined to form large registers.

Table 1. Accumulator Registers
256-bit 512-bit 1024-bit
amll0 bml0 cm0
amlh0
amhl1 bmh0
amhh1
... ... ...
...
... ...
...
amll8 bml8 cm8
amlh8
amhl8 bmh8
amhh8

The 256-bit accumulator registers are prefixed with the letters am. Two of them are aliased to form a 512-bit register that is prefixed with bm. Two bm can be aliased to form a 1024-bit register prefixed with cm.

The shift-round-saturate operation can be done by moving a value from an accumulator register to a vector register with any required shifting and rounding.

aie::accum<acc64,8> acc;
aie::vector<int32,8> res=acc.to_vector<int32>(10);//shift right 10 bits, from accumulator register to vector register

The upshift operation is used to move a value from a vector register to an accumulator register.

aie::vector<int32,8> v;
aie::accum<acc64,8> acc;
acc.from_vector(v, /*shift=*/10); //shift left 10 bits, from vector register to accumulator register
aie::print(acc,/*start a new line=*/true,/*prefix*/"acc value=");

Besides from_vector() and to_vector() functions, aie::accum class has the following member functions similar to aie::vector.

insert()
Updates the contents of a region of the accumulator using the values in the given native subaccumulator and returns a reference to the updated accumulator.
grow()
Returns a copy of the current accumulator in a larger accumulator. The grow() function creates and returns a larger vector where current vector is copied to a larger vector and the other parts are undefined. The function parameter indicates the location where the current vector should be copied within the output vector.
extract()
Returns a subaccumulator with the contents of a region of the accumulator.
cast_to()
Reinterprets the current accumulator as an accumulator of the given type. The number of elements is automatically computed by the function.
int32 data[8]={1,2,3,4,5,6,7,8};
aie::vector<int32,8> v=aie::load_v<8>(data);
aie::accum<acc64,8> acc; 
acc.from_vector(v, /*shift=*/0); //shift left 0 bits

aie::accum<acc64,16> acc2=acc.grow<16>();
aie::print(acc2,/*start a new line=*/true,/*prefix*/"acc2 value=");
//Output: acc2 value=0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000

acc2.insert(1,acc);
aie::print(acc2,true,"acc2 value=");
//Output: acc2 value=0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008 0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008

aie::accum<cacc48,4> cacc1=acc2.extract<8>(0).cast_to<cacc48>();//extract lower part, and cast to cacc48
aie::print(cacc1,true,"cacc1 value=");
//Output: cacc1 value=(0x000000000001, 0x000000000002) (0x000000000003, 0x000000000004) (0x000000000005, 0x000000000006) (0x000000000007, 0x000000000008)