Two branches share the same PLIO. Their outputs connect to a kernel that computes the difference between the branches.
VERSION=3 of this design stalls almost immediately because this design needs FIFOs set up at the input and output of each branch.
VERSION=4 sets these FIFOs and the overall simulation lasts approximately 275 µs.