With generic type input, the function dispatches one element per cycle. This mode works best for sharing the multi-cycle processing work across an array of units.