The UltraScale DSP block (DSP48E2) primitive can compute the square of an input or of the output of the pre-adder.
Download the coding example files from Coding Examples.
The following are examples of the square of a difference; this can be used to efficiently replace calculations on absolute values of differences.
It fits into a single DSP block and runs at full speed. The coding example files mentioned above also include an accumulator of square of differences which also fits into a single DSP block for the UltraScale architecture.