Insufficiently sized FIFOs (and PIPOs) in dataflow can cause deadlocks. Consider the following diagram:
Case 1:
Producer alternately writes to FIFO1, FIFO2, FIFO1, FIFO2, and so on.
Consumer alternately reads from FIFO1, FIFO2, FIFO1, FIFO2, and so on.
A depth of 1 for both FIFOs is enough to avoid deadlocks (and the default depth of 2 optimizes for performance).
Case 2 (same structure):
Producer writes to FIFO1 for N times, then to FIFO2 for N times
Consumer alternately reads from FIFO1, FIFO2, FIFO1, FIFO2, and so on.
A depth of N is necessary for FIFO1 (and the default depth of 2 for FIFO2 is optimal for performance)
As you can see from the above two simple cases, for exactly the same structure of code, depending on how the FIFO channels are accessed, the FIFO depths may need to be set differently. FIFO depths are used to amortize and match the burst behavior of FIFO accesses.
Compiler-created FIFOs and PIPOs (from scalars or arrays between processes) should never cause deadlocks. But their depths might be insufficient for optimal performance. User created FIFOs (from hls::streams and hls::stream_of_blocks between processes) may cause both deadlocks and/or low performance depending on their depths.
TIP: Deadlocks due to insufficient FIFO depths always exhibit at least one blocked writer. If not, it is most likely a design issue - typically due to non-blocking reads or writes, or reads and writes conditioned by empty() and full().
The goal of this tutorial is to help you analyze a dataflow design and identify its bottlenecks, which may be:
Processes or regions which have a larger II than the rest, and thus can constrain the overall throughput. This issue can be fixed by:
Reducing the II for such processes
Investigating dataflow regions and to “dig” inside them to discover the reason (which may be any one of these three).
FIFOs (channels with their own handshake: which includes streams, streams of blocks, streamed arrays) or PIPOs (channels without their own handshake: includes PIPOs and TLFs) that have a depth that is too small, and thus can become full. This issue can be fixed by increasing the depth of the channel. A following section will describe how to do this.
Top-level synchronization (scalar or external memory inputs from above or outputs to the calling context, synchronized via ap_ctrl_chain or ap_ctrl_hs of the region). In this case the remedy is to manually copy these variables and pass them through the network of processes to avoid the loss of performance.
Note that for complex designs, with data-dependent synchronizations (e.g. a process reads 128 times from a FIFO in one execution, and 32 times in another), a process may block for a variety of reasons, that change over time. In this case, the Dataflow co-simulation waveforms may be the only viable approach for debugging - as described in the earlier lab.
As the second exercise in this tutorial, you will first synthesize the example design and bring up the dataflow viewer to show how a deadlock can be investigated and resolved. In this lab, you will look at a simple deadlock example, found in the reference-files/deadlock
folder.
In this lab, you will:
Understand how to use the different features of the dataflow viewer to investigate a deadlock.
Use the FIFO sizing features to resolve the deadlock and improve performance.