Register Files

Versal Adaptive SoC AI Engine Architecture Manual (AM009)

Document ID
Release Date
1.3 English

The AI Engine has several types of registers. Some of the registers are used in different functional units. This section describes the various types of registers.

Scalar Registers

Scalar registers include configuration registers. See the following table for register descriptions.

Table 1. Scalar Registers
Syntax Number of bits Description
r0..r15 32 bits General-purpose registers
m0..m7 20 bits Modifier registers
p0..p7 20 bits Pointer registers
cl0..cl7 32 bits Configuration registers
c0..c7 64 bits

Special Registers

Table 2. Special Registers
Syntax Number of bits Description
cb0..cb7 20 bits Circular buffer start address
cs0..cs7 20 bits Circular buffer size
wcs0..wcs3 40 bits Wide circular buffer size
s0..s7 8 bits Shift control
sp 20 bits Stack pointer
lr 20 bits Link register
pc 20 bits Program counter
fc 20 bits Fetch counter
mc0..mc1 32 bits Status register
md0..md1 32 bits Mode control register
ls 20 bits Loop start
le 20 bits Loop end
lc 32 bits Loop count
lci 32 bits Loop count (PCU)
S 8 bits Shift control

Vector Registers

Vector registers are high-width registers to allow SIMD instructions. The underlying basic hardware registers are 128-bit wide, prefixed with the letter V. Two V registers can be grouped to form a 256-bit register prefixed with W. WR, WC, and WD registers are grouped in pairs to form 512-bit registers (XA, XB, XC, and XD). XA and XB form the 1024-bit wide YA registers. For all the registers except YD, the order is LSB from the top of the table to MSB at the bottom of the table. For YD, the LSBs are from the XD, and the MSBs are from the XB, that is:

Table 3. Vector Registers
128-bit 256-bit 512-bit 1024-bit
vrl0 wr0 xa ya N/A
vrl1 wr1
vrl2 wr2 xb yd (MSBs)
vrl3 wr3
vcl0 wc0 xc N/A N/A
vcl1 wc1
vdl0 wd0 xd N/A yd (LSBs)
vdl1 wd1

Accumulator Registers

Accumulator registers are used to store the results of the vector data path. They are 384-bit wide which can be viewed as 8 vector lanes of 48-bit each. The idea is to have 32-bit multiplication results and accumulate over those results without bit overflows. The 16 guard bits allow up to 216 accumulations. The accumulator registers are prefixed with the letters AM. Two of them are aliased to form a 768-bit register that is prefixed with BM.
Note: There are two modes of operation. In the first mode, the multiplication results are post-added into 8 accumulators using 16 post additions before the accumulation. In the second mode, the multiplication results are post-added into 16 accumulators using 8 post additions before the accumulation.
Table 4. Accumulator Registers
384-bit 768-bit
aml0 bm0
aml1 bm1
aml2 bm2
aml3 bm3