For input data (size = N), it divides the data into M blocks, sorts each block, and obtains M ordered blocks. The size of N depends on the capacity of the DDR, and the size of M depends on the in-chip LUT and URAM resources. Its design is show in the figure below: