As stated in Abstract Parallel Programming Model for HLS, in order to achieve high performance hardware the HLS compiler must infer parallelism from sequential code and exploit it to achieve greater performance. The DATAFLOW optimization tries to create task-level parallelism where possible between the various functions in the code, on top of the loop-level parallelism of pipelining.
In the earlier steps, you found different ways to optimize the DCT algortithm so that you could achieve an II=1 with the pipelined loops. In this step, you use the DATAFLOW directive to enable task-level parallelism for functions or loops. For more information, refer to syn.directive.dataflow for more information.