With input casted to a long ap_uint
vector, higher input rate can be done.
This implementation consists of two dataflow processes working in parallel.
The first one breaks the vector into a ping-pong buffer,
while the second one reads from the buffers and schedules output in a
round-robin order.
The ping-pong buffers are implemented as two ap_uint
of width as least
common multiple (LCM) of input width and total output stream width.
This imposes a limitation, as the LCM should be no more than
AP_INT_MAX_W
, which is default to 1024 in HLS.
Caution
Though AP_INT_MAX_W
can be set to larger values, it might slow down HLS
synthesis. The macro must be
set before first inclusion of ap_int.h
header to effectively override AP_INT_MAX_W
.
This library tries to override AP_INT_MAX_W
to 4096, but it is only
effective when ap_int.h
is not included before utility library
headers.