The elements of the AMD logic that enables digital signal processing are the DSP48E2 slices for UltraScale FPGAs and UltraScale+ devices and the DSP58 slices for Versal devices. These elements can be configured to perform over 30 unique operations. For DSP48E2 information see the DSP48E2 Operation Modes section in UltraScale Architecture DSP Slice User Guide (UG579). For the DSP58 see the DSP58 Operation Modes in Versal Adaptive SoC DSP Engine Architecture Manual (AM004). The DSP tile functions can change on a clock-cycle by clock-cycle basis.
DSP48E2
The following figure shows the DSP48E2 block diagram.
Two DSP48E2 slices with a dedicated interconnect form each DSP tile (see Figure 1). The DSP tiles stack vertically in a DSP48E2 column. The height of a DSP tile is the same as five configurable logic blocks (CLBs) and matches the height of one 36 K block RAM (can split to 2 x 18 K). Each DSP48E2 slice aligns horizontally with an 18 K block RAM, providing optimal connectivity between resources.
The DSP48E2 and DSP58 slices are constructed in such a way that adjacent elements can cascade data in a pipelined fashioned through dedicated high-speed routing. There are five dedicated interconnect connections between DSP tiles (also known as cascade). The cascades make sure the adjacent DSP is used which reduces potential route length issues and maintains published DSP device AC switching speed performance (for designs such as a finite impulse response filter).
DSP58
The DSP58 is a strict superset of the DSP48E2 cell available in Versal devices and later generations of AMD devices. See the following block diagram.
DSP58 design additions to the DSP48E2 architecture:
- 27 × 24 multiplier
- B operand is increased from 18-bit to 24-bit.
- 58-bit logic unit
- C operand is increased from 48-bit to 58-bit.
- 116-bit wide XOR function (increased from 96-bit)
- Wide XOR selectable for XOR12, XOR22 (new), XOR24, XOR34 (new), XOR58 (new), and XOR116 (new).
- DSPFP32 Mode
- Single precision floating-point multiplier and adder to produce both floating-point product and sum.
- Multiplier
- Input can be either FP32 or FP16 and the output is always FP32.
- Adder
- The input and output are both in FP32 only.
- DSPCPLX Mode
- Two back-to-back DSP58s in the same tile can be used together to implement 18 × 18 complex multiply and accumulate.
The DSP58 structure within the programmable logic is different than the DSP48E2. The DSP super tiles stack vertically to form a DSP super column. The height of a DSP super tile is the same as two configurable logic blocks (CLBs). It matches both the height of one 18 K block RAM and half a 288 K UltraRAM. Two 18 K block RAMs stack vertically to form a 36 K block RAM (see the following figure).