The Dataflow viewer by design can only show you the static view of the dataflow optimization. The graph shows the call-graph like structure of the dataflow region (as shown below). In this graph, you can get a sense of the throughput of your design by observing the II and latency of each function along a given path.
It is difficult to see how the functions inside the dataflow region are executed in parallel and how the execution of the functions overlap. In order to visualize this dynamic timeline you can use the AMD Vivado™ XSIM simulator and waveform viewer.
To launch the simulator waveform viewer you need to re-run RTL co-simulation with a few new settings:
From the menu, select the Solutions > Run C/RTL Co-Simulation command.
The Co-simulation dialog box displays as shown in the following figure.
Make the following selections:
Ensure that the Vivado XSIM simulator is chosen.
Select all for the Dump Trace option to trace all ports and signals. Note: This is a small design and so we can dump and trace all the signals. For a large design, this might cause an increased simulation run time as well as the creation of a large waveform database.
Enable the Wave Debug option to interactive launch the XSIM waveform viewer during simulation.
Enable the Channel (PIPO/FIFO) Profiling checkbox.
Click OK.
At this point, the Vitis HLS GUI will reinvoke RTL co-simulation. The difference this time around is that when it is done with simulation, it will display the Vivado XSIM waveform viewer (due to the Wave Debug
option), to let you inspect the waveforms generated during simulation (by the Dump Trace
option). You will see something like the following figure:
To easily explain how the dataflow optimization executes the functions inside the dataflow region in parallel, the waveforms are analyzed to track process starts and stops and a summary of this activity is presented in the waveform viewer. In the above diagram, note the following details:
The top function in the design is the
diamond
function. In the waveform viewer, this is shown asAESL_inst_diamond
.Note that the first item in the
Name
column is theHLS Process Summary
. This section show the activity traces (using cyan colored bars) of the dataflow region inside thediamond
function. This is in fact, a replica of the activity traces found under theAESL_inst_diamond_activity
item. TheHLS Process Summary
just brings together the function activity waveforms together in one section in the waveform viewer. The first line shows a summary of the number of active iterations of the diamond function that are executing in parallel at that particular time point (1, 2, 3, 2, 1).Expand this level to show the individual active invocations of the functions (
funcA, funcB, funcC, & funcD
). In the provided testbench for this test, the top level functiondiamond
is called 3 times. So the activity traces for each function shows when each of the three calls to a function are executed. Also what is visible is the order in which the functions are executed inside the body of functiondiamond
. FirstfuncA
starts followed by the parallel execution offuncB
andfuncC
and once these functions are done,funcD
starts executing. Small gaps in execution indicated by the yellow elipses can be situations where execution is stalled and worthy of a closer look. This view shows how the functions inside the dataflow region are executed in a pipelined manner — except that it is done in a dynamic pipeline instead of a static pipeline.Expand the
AESL_inst_diamond_activity
level to see a much more detailed view and to see how the three calls to the top level function are executed (#0, #1, #2). These are shown with green color bars. The iteration count starts at zero and ends at two for this particular testbench. You can compare the time take for each iteration to complete and you can also see how the iterations overlap in time. So even the multiple calls to the top level function are dynamically pipelined.You can investigate the activity traces for each of the sub-functions to see when each invocation of the sub-function starts and stops (shown by the green #0, #1, #2 bars while the cyan (1, 1, 1) bars just shows the active iterations at the given time point).
Additional details such a
StallNoContinue
signal is shown to highlight any back pressure that can cause stalling of the function executions. In the above diagram, back pressure fromfuncD
can be seen forfuncB
andfuncC
(highlighted on the variousStallNoContinue
waveforms by the red ellipses).The RTL level signals are also available for inspection when you expand the
RTL Signals
section.It should be noted that, in this default form of HLS dataflow (i.e., with PIPO channels only), successive communicating tasks in a kernel run do not overlap:
funcB
andfuncC
can only start once their buffer fromfuncA
(ping or pong) is released.funcB
andfuncC
could possibly start earlier, if FIFOs were used as an alternative channel to ping-pong buffers, when the data are consumed in the same order in which they are produced. PIPOs are generally used when data is written into the buffer in random order and therefore, the entire buffer is locked until all processing has been completed before releasing access to the buffer. FIFOs are generally used when you have a streaming type of application where data is consumed in the order that it is created. This allows for the consumer to start processing as soon as there is data in the FIFO.