The four real multiplier solution makes maximum use of DSP Slice resources, and has higher clock frequency performance than the three real multiplier solution, in many cases reaching the maximum clock frequency of the FPGA.
It still consumes slice resources for pipeline balancing, but this slice cost is always less than that required by the equivalent three real multiplier solution.