The design of the primitive includes three modules:
- fetch: Attempts to read data from the n input streams.
- vectorize: Inner buffers as wide as the least common multiple of
N * Win
and Wout
are used to combine the inputs into vectors.
- emit: Reads vectorized data and emits to output stream.