With input casted to a long ap_uint
vector, higher input rate can be done.
This implementation consists of two dataflow processes working in parallel.
The first one breaks the vector into a ping-pong buffer,
while the second one reads from the buffers and schedules output in
round-robin order.
The ping-pong buffers are implemented as two ap_uint
of width as least
common multiple (LCM) of input width and total output stream width.
This imposes a limitation, as the LCM should be no more than
AP_INT_MAX_W
, which is default to 1024 in HLS.
Caution
Though AP_INT_MAX_W
can be set to larger values, it may slow down HLS
synthesis, and to effectively override AP_INT_MAX_W
, the macro must be
set before first inclusion of ap_int.h
header.
This library tries to override AP_INT_MAX_W
to 4096, but it’s only
effective when ap_int.h
has not be included before utility library
headers.