DSP for Versal Devices - 2024.1 English

Power Design Manager User Guide (UG1556)

Document ID
UG1556
Release Date
2024-05-30
Version
2024.1 English

AMD device families have different Digital Signal Processing (DSP) blocks with different capabilities. A recommended source to help you understand the Versal DSP block entries is the DSP58 Architecture section of the Versal Adaptive SoC DSP Engine Architecture Manual (AM004)

Tip: For families that have a register within the multiplier (MREG), using this pipeline register helps lower dynamic power.

The DSP page covers estimation of DSP58 block resources. Similar to previous generations, an AMD Versalâ„¢ adaptive SoC DSP58 block can implement a wide variety of arithmetic and logical functions ranging from addition, subtraction, multiplication, to common DSP58 functions such as multiply accumulate. Like previous generations, DSP58 blocks can implement wide logic functions such as XOR and can be cascaded to form digital filters. The DSP58 block has a wider 27x24 complex multiplier that you can also configure as three 9x8 multipliers and has a wider 58-bit accumulator. DSP58 also supports floating point addition and multiplication.

The following settings are used to configure Versal adaptive SoC DSP58 blocks for power estimation:

Configurations
Versal adaptive SoC PDM lets you use mode-specific configurations for the DSP58 block. You can select the suitable configurations from the drop-down list based on the DSP58 operation that you are performing. The Versal DSP58 power model supports much more fine-grained accuracy compared to previous generations. You can choose from various sizes of integer multipliers, MACs, dot-products, complex multipliers, and floating point operations.
DSP58 Slices
This is the number of DSP58 blocks. In the Versal architecture, one DSP58 block can implement a 27x24 fixed-point multiplier, while you can pair two DSP58 blocks with common logic to implement an 18-bit complex multiplier.
Clock
Choose the DSP58 Slice clock from the drop-down menu.
Block Toggle Rate
This is the average toggle rate of all DSP58 block signals. Manually adjust the toggle rate when necessary.
  • If the DSP58 block is enabled for only a fraction of cycles, scale the Block Toggle Rate by the enable rate. For example, if the DSP is enabled for half the cycles, multiply the Block Toggle Rate by 0.5 to get the new Block Toggle Rate.
  • If the DSP58 block does not use all the multiplier outputs, scale the Block Toggle Rate by the fraction of output bits used. If only 48 bits are used, then multiply the Block Toggle Rate by (48 / 58) to reflect the proportion of actively switching signals in the DSP58 block.
DSP Mode
This indicates the operational mode of the DSP58 blocks. It is auto-populated and is read-only for the specified configuration.
INT24
This mode is compatible with the DSP58 from previous generations. INT24 indicates the DSP58 block is configured as a 27x24 signed, fixed point multiplier. If using a smaller sized multiplier, scale the Block Toggle Rate by the proportion of used output bits.
INT8
DSP58 uses the Vector Fixed Point ALU mode in this configuration. This mode is used for computing three-element 9x8 vector dot products with accumulate or post add options.
CINT8
This mode indicates that two adjacent DSP58 blocks are configured to implement an 18-bit complex multiplier. Ensure that the DSP slice total takes into account two DSP slices per complex multiplier.
FP32
DSP58E5 uses floating point multiplier and adder in this configuration. This mode is used for FP32 single precision or FP16 half precision with accumulate or post add options.
MULT Used?
This indicates whether or not the DSP58 multiplier is used. The default value is Yes because the multiplier is expected to be used for the majority of cases. It is auto-populated and is read-only for a given configuration.
Multiplier Pipeline Used?
When MULT Used is Yes, it indicates if the multiplier is pipelined or not. The multiplier is typically pipelined due to its relatively large propagation delay so the default value is Yes. The value should be set to No only for very low clock speeds, or when multipliers are not used for the DSP58 configurations.
Pre-Add Used?
The DSP58 contains a 27-bit signed adder that can drive one or both inputs of the multiplier. Select Yes if you are implementing an arithmetic function that requires the pre-add, for example (B + D) * A. The default value is No. This field is auto-populated and is read-only for a given configuration.
AD Reg Used?
This indicates that the pre-adder output is pipelined before feeding to multiplier input. The default value is No. This field is auto-populated and you can override this setting.
DSP slices have clock enable (CE) ports. When entering data in the Toggle Rate column remember to multiply your data input toggle rate with the DSP slice clock enable rate. For example, if random data (typically ~38% data toggle rate) is input into the DSP slice and the slice is clock enabled only 50% of the time, then the output data toggle rate should be scaled by the CE rate such that the data toggle rate becomes 19% (38% x 50%).