The AI Engine accumulation registers are 256-bits wide and can be viewed as eight vector lanes of 32 bits each or four lanes of 64 bits each. The following table presents a comprehensive overview of the accumulator registers available in the AIE-ML and AIE-ML v2 architecture, showcasing how smaller registers are combined to form larger registers.
| 256-bit | 512-bit | 1024-bit |
|---|---|---|
| amll0 | bml0 | cm0 |
| amlh0 | ||
| amhl1 | bmh0 | |
| amhh1 | ||
| ... | ... | ... |
| ... | ||
| ... | ... | |
| ... | ||
| amll8 | bml8 | cm8 |
| amlh8 | ||
| amhl8 | bmh8 | |
| amhh8 |
The 256-bit accumulator registers are denoted by the prefix "am," while two such registers can be combined to form a 512-bit register with the prefix "bm." Further, two 512-bit registers can be aliased to create a 1024-bit register, denoted by the prefix "cm."
The AI Engine-ML v2 accumulation registers are 512-bits wide and can be viewed as sixteen vector lanes of 32 bits each or eight lanes of 64 bits each. The following table shows the set of AI Engine-ML v2 accumulator registers and how smaller registers are combined to form large registers.
| 512-bit | 1024-bit | 2048-bit |
|---|---|---|
| bmll0 | cml0 | dm0 |
| bmlh0 | ||
| bmhl0 | cmh0 | |
| bmhh0 | ||
| ... | ... | ... |
| ... | ||
| ... | ... | |
| ... | ||
| bmll7 | cml7 | dm7 |
| bmlh7 | ||
| bmhl7 | cmh7 | |
| bmhh7 |
The 512-bit accumulator registers are prefixed with bm. Two of them are aliased to form a 1024-bit
register prefixed with cm, and two cm can be aliased to form a 2048-bit register prefixed
with dm.
Operations
The shift-round-saturate operation can be done by moving a value from an accumulator register to a vector register with any required shifting and rounding.
aie::accum<acc64,8> acc;
//shift right 10 bits, from accumulator register to vector register
aie::vector<int32,8> res=acc.to_vector<int32>(10);
The upshift operation is used to move a value from a vector register to an accumulator register.
aie::vector<int32,8> v;
aie::accum<acc64,8> acc;
acc.from_vector(v, /*shift=*/10); //shift left 10 bits, from vector register to accumulator register
aie::print(acc,/*start a new line=*/true,/*prefix*/"acc value=");
Besides from_vector() and to_vector() functions, aie::accum class has the following member functions similar to aie::vector.
-
insert() - Updates the contents of a region of the accumulator using the values in the given native subaccumulator and returns a reference to the updated accumulator.
-
grow() - Returns a copy of the current accumulator in a larger
accumulator. The
grow()function creates and returns a larger vector where current vector is copied to a larger vector and the other parts are undefined. The function parameter indicates the location where the current vector should be copied within the output vector. -
extract() - Returns a subaccumulator with the contents of a region of the accumulator.
-
cast_to() - Reinterprets the current accumulator as an accumulator of the given type. The number of elements is automatically computed by the function.
alignas(aie::vector_decl_align) int32 data[8] = {1,2,3,4,5,6,7,8};
aie::vector<int32,8> v=aie::load_v<8>(data);
aie::accum<acc64,8> acc;
acc.from_vector(v, /*shift=*/0); //shift left 0 bits
aie::accum<acc64,16> acc2=acc.grow<16>();
aie::print(acc2,/*start a new line=*/true,/*prefix*/"acc2 value=");
//Output: acc2 value=0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000 0x000000
acc2.insert(1,acc);
aie::print(acc2,true,"acc2 value=");
//Output: acc2 value=0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008 0x000001 0x000002 0x000003 0x000004 0x000005 0x000006 0x000007 0x000008
aie::accum<cacc48,4> cacc1=acc2.extract<8>(0).cast_to<cacc48>();//extract lower part, and cast to cacc48
aie::print(cacc1,true,"cacc1 value=");
//Output: cacc1 value=(0x000000000001, 0x000000000002) (0x000000000003, 0x000000000004) (0x000000000005, 0x000000000006) (0x000000000007, 0x000000000008)