The UltraScale DSP block (DSP48E2) primitive can compute the square of an input or the output of the pre-adder.
Download the coding example files from Coding Examples .
The following are examples of the square of a difference; this can be used to efficiently replace calculations on absolute values of differences.
It fits into a single DSP block and runs at full speed. The coding example files mentioned previously also include an accumulator of the square of differences which also fits into a single DSP block for the UltraScale architecture.