The design latency usually depends on the amount of pipelining present in the RTL, which is traditionally used to improve the design maximum clock frequency. Following are the various types of pipelining:
- Pipelines required for operating special primitives, such as DSP,
block RAM, or UltraRAM, at the maximum frequency listed on the device data sheet.
Although these registers are important for high clock frequency designs, not all registers are needed for slower designs and can be removed to reduce latency.
- Pipelines required for reducing the maximum number of logic levels or
route levels on the longest paths in the design given the target frequency.
These pipelines are usually mapped to SLICE registers. When the register utilization percentage exceeds 50% at the device level or at the SLR level for SSI devices, the logic placement can become more difficult to legalize and Fmax can degrade. Also, if the overall latency through an RTL module or design is high, you must reduce the register utilization on paths with zero or one logic level, especially if these paths are locally placed and routed.
- Pipelines required for balancing latency with some other paths.
Versal devices have an even distribution of SRL cells that you must use as much as possible by default. The Vivado implementation tools support several physical optimization that pull registers out of SRLs whenever needed to help meet timing.
- Pipelines introduced by the Vitis HLS tools to improve logic levels and maximize the chance of
closing timing at the predicted target frequency.
You can control the maximum latency via QoS constraints for specific functions by setting attributes in C/C++. You can also reduce the target frequency for Vitis HLS and perform a pre-placement timing analysis to verify that the design can meet timing based on ideal placement and logic level information. For more information on the Vitis HLS tools, see the Vitis High-Level Synthesis User Guide (UG1399).
In addition to these types of pipeline registers, most AMD or third-party IP provide interface register, latency, or target frequency options. You must refine all available settings to meet the right latency and Fmax trade-off by reviewing the guidelines provided in each IP product guide. In addition, you can synthesize and implement each IP standalone to validate that timing closure can be achieved, preferably with 5 to 15% Fmax margin.