Dataflow optimization is a powerful technique to improve the kernel performance
by enabling task-level pipelining and parallelism inside the kernel. It allows the
v++
compiler to schedule multiple functions of the
kernel to run concurrently to achieve higher throughput and lower latency. This is also
known as task-level parallelism. To help, you need to understand the best practices for
writing good software for execution on the FPGA as discussed in .
The following figure shows a conceptual view of dataflow pipelining. The default
behavior is to execute and complete func_A
, then
func_B
, and finally func_C
. With the enabled, the
compiler can schedule each function to execute as soon as data is available. In this
example, the original top
function has a latency and
interval of eight clock cycles. With the dataflow optimization, the interval is reduced
to only three clock cycles.