Dataflow optimization is a powerful technique to improve the kernel performance
by enabling task-level pipelining and parallelism inside the kernel. It allows the
v++
compiler to schedule multiple functions of the kernel to run
concurrently to achieve higher throughput and lower latency. This is also known as
task-level parallelism.
The following figure shows a conceptual view of dataflow pipelining. The default
behavior is to execute and complete func_A
, then
func_B
, and finally func_C
. With the
pragma HLS
dataflow
enabled, the
compiler can schedule each function to execute as soon as data is available. In this
example, the original top
function has a latency and
interval of eight clock cycles. With the dataflow optimization, the interval is reduced
to only three clock cycles.