DSP for Versal Devices - 2023.2 English

Power Design Manager User Guide (UG1556)

Document ID
UG1556
Release Date
2023-10-18
Version
2023.2 English

AMD device families have different Digital Signal Processing (DSP) blocks with different capabilities. To enter information for DSP first review the UltraScale Architecture DSP Slice User Guide (UG579) to understand the parameters in the DSP tab.

Tip: The default DSP configuration is assumed to be 27x18 in PDM. The toggle rate must be scaled accordingly for accurate power estimation. For example, if 18x18 DSP is expected to toggle 25%, then scale it by 0.86, which is 21.5% and enter into PDM. Similarly, scale the actual toggle rate by 0.8 for 12x12 configuration.
Tip: DSP slices have clock enable (CE) ports. When entering data in the Toggle Rate column remember to multiply your data input toggle rate with the DSP slice clock enable rate. For example, if random data (typically ~38% data toggle rate) is input into the DSP slice and the slice is clock enabled only 50% of the time, then the output data toggle rate should be scaled by the CE rate such that the data toggle rate becomes 19% (38% x 50%). See the following figure for a example.
Tip: For families that have a register within the multiplier (MREG), using this pipeline register helps lower dynamic power.

The DSP page covers estimation of DSP58 block resources. Similar to previous generations, an AMD Versalâ„¢ adaptive SoC DSP block can implement a wide variety of arithmetic and logical functions ranging from addition, subtraction, multiplication, and common DSP functions such as multiply accumulate. Like previous generations, DSP blocks can implement wide logic functions such as XOR and can be cascaded to form digital filters. The Versal adaptive SoC DSP block has a wider 27x24 complex multiplier that you can also configure as three 9x8 multipliers and has a wider 58-bit accumulator. Versal adaptive SoC DSP also supports floating point addition and multiplication.

The following settings descriptions are used to configure Versal adaptive SoC DSP blocks for power estimation:

Configurations
Versal adaptive SoC PDM lets you use mode-specific configurations for the DSP block. You can select the suitable configurations from the drop-down list based on the DSP operation that you are performing. The Versal DSP58 power model supports much more fine-grained accuracy compared to previous generations. You can choose from various sizes of integer multipliers, MACs, dot-products, complex multipliers, and floating point operations.
DSP58 Slices
This is the number of DSP58 blocks. In the Versal architecture, one DSP58 block can implement a 27x24 fixed-point multiplier, while you can pair two DSP58 blocks with common logic to implement an 18-bit complex multiplier.
Clock
Choose the DSP58 Slice clock from the drop-down menu.
Block Toggle Rate
This is the average toggle rate of all DSP block signals. Manually adjust the toggle rate when necessary.
  • If the DSP block is enabled for only a fraction of cycles, scale the Block Toggle Rate by the enable rate. For example, if the DSP is enabled for half the cycles, multiply the Block Toggle Rate by 0.5 to get the new Block Toggle Rate.
  • If the DSP block does not use all the multiplier outputs, scale the Block Toggle Rate by the fraction of output bits used. If only 48 bits are used, then multiply the Block Toggle Rate by (48 / 58) to reflect the proportion of actively switching signals in the DSP block.
DSP Mode
This indicates the operational mode of the DSP blocks. It is auto-populated and is read-only for the specified configuration.
INT24
This mode is compatible with the DSP58 from previous generations. INT24 indicates the DSP block is configured as a 27x24 signed, fixed point multiplier. If using a smaller sized multiplier, scale the Block Toggle Rate by the proportion of used output bits.
INT8
DSP58 uses the Vector Fixed Point ALU mode in this configuration. This mode is used for computing three-element 9x8 vector dot products with accumulate or post add options.
CINT8
This mode indicates that two adjacent DSP58 blocks are configured to implement an 18-bit complex multiplier. Ensure that the DSP slice total takes into account two DSP slices per complex multiplier.
FP32
DSP58E5 uses floating point multiplier and adder in this configuration. This mode is used for FP32 single precision or FP16 half precision with accumulate or post add options.
MULT Used?
This indicates whether or not the DSP58 multiplier is used. The default value is Yes because the multiplier is expected to be used for the majority of cases. It is auto-populated and is read-only for a given configuration.
Multiplier Pipeline Used?
When MULT Used is Yes, it indicates if the multiplier is pipelined or not. The multiplier is typically pipelined due to its relatively large propagation delay so the default value is Yes. The value should be set to No only for very low clock speeds, or when multipliers are not used for the DSP configurations.
Pre-Add Used?
The DSP58 contains a 27-bit signed adder that can drive one or both inputs of the multiplier. Select Yes if you are implementing an arithmetic function that requires the pre add, for example (B + D) * A. The default value is No. This field is auto-populated and is read-only for a given configuration.
AD Reg Used?
This indicates that the Pre-Adder output is pipelined before feeding to multiplier input. The default value is No. This field is auto-populated and you can override this setting.