The design of the primitive includes three modules:
- fetch: Attempts to read data from the n input streams.
- vectorize: Inner buffers as wide as the least common multiple of
N * Win
andWout
are used to combine the inputs into vectors. - emit: Reads vectorized data and emit to output stream.
Important
The depth of output streams must be no less than four due to an internal delay.