The AI Engine has several types of
registers. Some of the registers are used in different functional units. This
section describes the various types of registers.
Scalar Registers
Scalar registers include configuration registers. See the following table for register
descriptions.
Table 1. Scalar Registers
Syntax |
Number of bits |
Description |
r0..r15 |
32 bits |
General-purpose registers |
m0..m7 |
20 bits |
Modifier registers |
p0..p7 |
20 bits |
Pointer registers |
cl0..cl7 |
32 bits |
Configuration registers |
ch0..ch7 |
c0..c7 |
64 bits |
Special Registers
Table 2. Special Registers
Syntax |
Number of bits |
Description |
cb0..cb7 |
20 bits |
Circular buffer start address |
cs0..cs7 |
20 bits |
Circular buffer size |
wcs0..wcs3 |
40 bits |
Wide circular buffer size |
s0..s7 |
8 bits |
Shift control |
sp |
20 bits |
Stack pointer |
lr |
20 bits |
Link register |
pc |
20 bits |
Program counter |
fc |
20 bits |
Fetch counter |
mc0..mc1 |
32 bits |
Status register |
md0..md1 |
32 bits |
Mode control register |
ls |
20 bits |
Loop start |
le |
20 bits |
Loop end |
lc |
32 bits |
Loop count |
lci |
32 bits |
Loop count (PCU) |
S |
8 bits |
Shift control |
Vector Registers
Vector
registers are high-width registers to allow SIMD instructions. The
underlying basic hardware registers are 128-bit wide, prefixed with the letter V.
Two V registers can be grouped to form a 256-bit register prefixed with W. WR, WC,
and WD registers are grouped in pairs to form 512-bit registers (XA, XB, XC, and
XD). XA and XB form the 1024-bit wide YA registers. For all the registers except YD,
the order is LSB from the top of the table to MSB at the bottom of the table. For
YD, the LSBs are from the XD, and the MSBs are from the XB, that
is:
YD =
VDL0::VDH0::VDL1::VDH1::VRL2::VRH2::VRL3::VRH3
Table 3. Vector Registers
128-bit |
256-bit |
512-bit |
1024-bit |
vrl0 |
wr0 |
xa |
ya |
N/A |
vrh0 |
vrl1 |
wr1 |
vrh1 |
vrl2 |
wr2 |
xb |
yd (MSBs) |
vrh2 |
vrl3 |
wr3 |
vrh3 |
vcl0 |
wc0 |
xc |
N/A |
N/A |
vch0 |
vcl1 |
wc1 |
vch1 |
vdl0 |
wd0 |
xd |
N/A |
yd (LSBs) |
vdh0 |
vdl1 |
wd1 |
vdh1 |
Accumulator Registers
Accumulator
registers are used to store the results of the vector data path. They are 384-bit wide
which can be viewed as 8 vector lanes of 48-bit each. The idea is to have 32-bit
multiplication results and accumulate over those results without bit overflows. The 16
guard bits allow up to 2
16 accumulations. The accumulator registers are prefixed with the letters AM.
Two of them are aliased to form a 768-bit register that is prefixed with BM.
Note: There are two modes of operation. In the first mode, the multiplication
results are post-added into 8 accumulators using 16 post additions before the
accumulation. In the second mode, the multiplication results are post-added into 16
accumulators using 8 post additions before the accumulation.
Table 4. Accumulator Registers
384-bit |
768-bit |
aml0 |
bm0 |
amh0 |
aml1 |
bm1 |
amh1 |
aml2 |
bm2 |
amh2 |
aml3 |
bm3 |
amh3 |