The design of the primitive includes 3 modules:
- fetch: attempt to read data from the n input streams.
- vectorize: Inner buffers as wide as the least common multiple of
N * Win
andWout
are used to combine the inputs into vectors. - emit: read vectorized data and emit to output stream.
Attention
Current implementation has the following limitations:
- It uses a wide
ap_uint
as internal buffer. The buffer is as wide as the least common multiple (LCM) of input width and total output width. The width is limited byAP_INT_MAX_W
, which defaults to 1024. - This library will try to override
AP_INT_MAX_W
to 4096, but user should ensure thatap_int.h
has not be included before the library headers. - Too large
AP_INT_MAX_W
will significantly slow down HLS synthesis.