Versal devices provide many new dedicated IP blocks, such as the NoC, DDRMC, CPM, and AI Engines. These dedicated IP blocks deliver the next level of system-level performance per Watt with high bandwidth data movement and interfaces. To accommodate the integration of these new dedicated IP blocks, the Versal device programmable logic (PL) is upgraded from the UltraScale+ device PL to be more efficient with regard to silicon area while maintaining similar PL performance. As a result, following are key differences when working with Versal devices:
- Many common hardware functions mapped to the PL in previous architectures are now efficiently supported by dedicated IP blocks, which saves significant PL resources.
- Delay distribution of the PL routing interconnect and CLB as well as clock skew and jitter characteristics differ from previous architectures. This difference leads to some logic paths becoming faster and some logic paths becoming slower. Key CLB and clocking differences are covered in subsequent sections of this chapter.
- The increased amount of PL RAM resources (including the silicon efficient UltraRAM) and of special IP block columns required by the next generation of applications introduces additional routing delay variations.
When migrating PL functions to Versal devices, legacy RTL designs can require tuning to reduce logic levels around carry operators and to rebalance logic levels between pipeline registers to achieve the same average programmable logic fabric performance as previous generations on equivalent device speedgrades. For hardware design recommendations, see the Versal Adaptive SoC Hardware, IP, and Platform Development Methodology Guide (UG1387). For timing closure recommendations, see the Versal Adaptive SoC System Integration and Validation Methodology Guide (UG1388).
Traditional Fmax benchmarking, where maximum achievable PL clock speeds of RTL designs are compared between target technologies, is not an appropriate method of evaluating Versal adaptive SoCs against previous generation FPGAs and SoCs for the following reasons:
- The Versal architecture is optimized for adaptive acceleration. Therefore, focusing on PL clock speeds does not account for the advantage of the Versal device dedicated IP blocks. Instead, AMD recommends focusing the comparison on system-level compute and throughput metrics.
- The new Versal adaptive SoC high-level building blocks are not inferred from RTL but are designed using the AMD Vitis™ environment or the AMD Vivado™ IP integrator. Therefore, comparing RTL designs overestimates the Versal device PL utilization, failing to account for utilization and power savings from using the Versal device dedicated IP blocks.