Xilinx recommends registering inputs and outputs of reconfigurable modules (RMs) in a DFX design for multiple reasons.
In the parent implementation, the RM used for the partition does not need to be the actual design. Instead, the RM can be training logic used as a placeholder while you define the platform. If the training logic is sub-optimal and there is combinatorial logic in the static portion of the boundary timing paths, it is likely that the static portion of the path consumes a significant amount of timing budget of the path. During child implementation, this can cause timing closure issues for the signals in the RM connected to this boundary signal.
In addition, creating an abstract shell of a reconfigurable partition (RP) prunes most of the static region and keeps only the logic up to the first sequential cell in the static region. Registering the input and output pins of the RMs enables maximum abstraction, thereby reducing the size of the abstract shell.