The same rules regarding connectivity within the top-level function apply when decomposing the compute function. Aim for feed-forward connections and having a single producer and consumer for each connecting variable. If a variable must be consumed by more than one function, then it should be explicitly duplicated.
When moving blocks of data from one compute block to another, the developer can
choose to use arrays or hls::stream
objects.
Using arrays requires fewer code changes and is usually the fastest way to make
progress during the decomposition process. However, using hls::stream
objects can lead to designs using less memory resources and having shorter latency. It
also helps the developer reason about how data moves through the kernel, which is always
an important thing to understand when optimizing for throughput.
Using hls::stream
objects is usually a good thing to do, but it
is up to the developer to determine the most appropriate moment to convert arrays to
streams. Some developers will do this very early on while others will do this at the
very end, as a final optimization step. This can also be done using the
pragma HLS
dataflow
.
At this stage, maintaining a graphical representation of the architecture of the kernel can be very useful to reason through data dependencies, data movement, control flows, and concurrency.