This algorithm fits the scenarios where I/O rate is low, and uses the FPGA resources efficiently.
However, the resource is linear to the max number of elements allowed to be sorted once,
so to scale to large input, it should be used together with merge-sort.
See Internals of Insert Sort