The following RTL code snippet generates a critical path from block RAM (actually it is a ROM) with multiple logic levels ending at a flip-flop (FF). The RAMB cell has been inferred without the optional output registers (DOA-0), which adds over 1 ns extra delay penalty to the RAMB output path.
The critical path for the above RTL code is shown by the tool, such as in the following figure.
It is good practice to review the critical paths after synthesis and after each implementation step in order to identify which groups of logic need to be improved. For long paths or any paths that do not take advantage of the FPGA hardware features optimally, go back to the RTL description, try to understand why the synthesized logic is not optimal, and modify the code to help the synthesis tool improve the netlist.
Vivado has a powerful embedded debugging mechanism that you can use to start off with elaborated view. The elaborated view helps to identify where the problem could be, instead of manually searching through the RTL code. See the elaborated view shown in the following figure for the above RTL code snippet.
The elaborated view gives a good hint about the inefficient structure for the given test case. In this case, the problem comes from the address register fanout (addr_reg3_reg), which drives the memory address as well as some glue-logic, highlighted in blue.
RAMB inference by the synthesis tool requires a dedicated address register in the RTL code, which is not compatible with the current address register fanout. As a consequence, the synthesis tool re-times the output register in order to allow the RAMB inference instead of using it to enable the RAMB optional output register.
By replicating the address register in the RTL code so that the memory address and the interconnect logic | FPGA logic are driven by separate registers, the RAMB will be inferred with the output registers enabled.
The RTL code and elaborated view after manual replication are shown in the following figures:
The critical path for the modified RTL code can be seen in the following figure. Notice the following:
- The
addr_reg2_reg
register is connected to the address pin of the block RAM. - The
addr_reg3_reg
register has been absorbed in the Block RAM. - The RAMB output register is enabled, which significantly reduces the datapath delay on the RAMB outputs.Figure 6. Critical Path for the Modified RTL Code