Dataflow optimization is a powerful technique to improve the kernel performance
by enabling task-level pipelining and parallelism inside the kernel. It allows the
v++
compiler to schedule multiple functions of the
kernel to run concurrently to achieve higher throughput and lower latency. This is also
known as task-level parallelism. To help, you need to understand the best practices for
writing good software for execution on the FPGA as discussed in
Optimizing for
Throughput in the Vitis HLS User
Guide (UG1399).
The following figure shows a conceptual view of dataflow pipelining. The default
behavior is to execute and complete func_A
, then
func_B
, and finally func_C
. With the
pragma HLS
dataflow
enabled, the
compiler can schedule each function to execute as soon as data is available. In this
example, the original top
function has a latency and
interval of eight clock cycles. With the dataflow optimization, the interval is reduced
to only three clock cycles.