Coding for Optimal DSP and Arithmetic Inference - 2023.1 English

Versal Adaptive SoC Hardware, IP, and Platform Development Methodology Guide (UG1387)

Document ID
UG1387
Release Date
2023-05-24
Version
2023.1 English

The DSP blocks within the AMD devices can perform many different functions, including:

  • Multiplication
  • Addition and subtraction
  • Comparators
  • Counters
  • General logic

The DSP blocks are highly pipelined blocks with multiple register stages allowing for high-speed operation while reducing the overall power footprint of the resource. AMD recommends that you fully pipeline the code intended to map into the DSP58, so that all pipeline stages are utilized. To allow the flexibility of use of this additional resource, a set condition cannot exist in the function for it to properly map to this resource.

DSP58 slice registers within AMD devices contain only resets, and not sets. Accordingly, unless necessary, do not code a set (value equals logic 1 upon an applied signal) around multipliers, adders, counters, or other logic that can be implemented within a DSP58 slice.

Many DSP designs are well-suited for the AMD architecture. To obtain best use of the architecture, you must be familiar with the underlying features and capabilities so that design entry code can take advantage of these resources.

The DSP58 blocks use a signed arithmetic implementation. AMD recommends code using signed values in the HDL source to best match the resource capabilities and, in general, obtain the most efficient mapping. If unsigned bus values are used in the code, the synthesis tools may still be able to use this resource, but might not obtain the full bit precision of the component due to the unsigned-to-signed conversion.

If the target design is expected to contain a large number of adders, AMD recommends that you evaluate the design to make greater use of the DSP58 slice pre-adders and post-adders. For example, with FIR filters, the adder cascade can be used to build a systolic filter rather than using multiple successive add functions (adder trees). If the filter is symmetric, you can evaluate using the dedicated pre-adder to further consolidate the function into both fewer LUTs and flip-flops and also fewer DSP slices as well (in most cases, half the resources).

By knowing these capabilities, the proper trade-offs can be acknowledged up front and accounted for in the RTL code to allow for a smoother and more efficient implementation from the start. In most cases, AMD recommends inferring DSP resources.