In this new example, two branches are fed with the same PLIO and connected on the output side to a kernel that will compute the difference between the two branch outputs.
VERSION=3
of this design stalls almost immediately because this design needs FIFOs set up at the input and output of each branch.
VERSION=4
sets these FIFOs and the overall simulation lasts approximately 275 µs.