The ap_ctrl_none
block-level I/O
protocol avoids the rigid synchronization scheme implied by the ap_ctrl_hs
and ap_ctrl_chain
protocols.
These protocols require that all processes in the region are executed exactly the same
number of times in order to better match the C/C++ behavior.
However, there are situations where, for example, the intent is to have a faster process that executes more frequently to distribute work to several slower ones.
For any dataflow region (except "dataflow-in-loop"), it is possible to
specify #pragma HLS interface mode=ap_ctrl_none
port=return
as long as all the following conditions are satisfied:
-
The region and all the processes it contains communicates only via FIFOs (hls::stream, streamed arrays, AXIS); that is, excluding memories.
- All the parents of the region, up to the top level design, must fit
the following requirements:
- They must be dataflow regions (excluding "dataflow-in-loop").
- They must all specify
ap_ctrl_none
.
This means that none of the parents of a dataflow region with ap_ctrl_none
in the hierarchy can be:
- A sequential or pipelined FSM
- A dataflow region inside a for loop ("dataflow-in-loop")
This restriction can be relaxed if an ap_ctrl_none region is instantiated in an ap_ctrl_chain region where all the I/O streams of the ap_ctrl_none region are produced and consumed by processes in the ap_ctrl_chain region. The latter region can then be in dataflow-in-loop or be called by a sequential or pipelined FSM.
The result of this pragma is that ap_ctrl_chain
is not used to synchronize any of the processes inside that
region. They are executed or stalled based on the availability of data in their input
FIFOs and space in their output FIFOs. For example:
void region(...) {
#pragma HLS dataflow
#pragma HLS interface mode=ap_ctrl_none port=return
hls::stream<int> outStream1, outStream2;
demux(inStream, outStream1, outStream2);
worker1(outStream1, ...);
worker2(outStream2, ....);
In this example, demux
can be executed
twice as frequently as worker1
and worker2
. For example, it can have II=1 while worker1
and worker2
can
have II=2, and still achieving a global II=1 behavior.
- hls::tasks are a way to avoid this requirement.
- Non-blocking reads may need to be used very carefully inside processes that are executed less frequently to ensure that C/C++ simulation works.
- The pragma is applied to a region, not to the individual processes inside it.
- Deadlock detection must be disabled in co-simulation. This can
be done with the
-disable_deadlock_detection
option in cosim_design.