Improving Synthesis Runtime and Capacity - 2024.1 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2024-07-03
Version
2024.1 English

The HLS compiler schedules objects. Whether the object is a floating-point multiply operation or a single register, it is still an object to be scheduled. The floating-point multiply can take multiple cycles to complete, and use many resources to implement, but at the level of scheduling it is still one object.

The HLS compiler schedules operations hierarchically. The operations within a loop are scheduled, then the loop, the sub-functions and operations within a function are scheduled. Runtime for the compiler increases when:

  • There are more objects to schedule.
  • There is more freedom and more possibilities to explore.

Unrolling loops and partitioning arrays creates more objects to schedule and potentially increases the runtime. Inlining functions creates more objects to schedule at this level of hierarchy and also increases runtime. Be very careful about simply partitioning all arrays, unrolling all loops and inlining all functions. These optimizations can be required to meet performance targets, but you can expect synthesis runtime increase. Use the optimization strategies discussed in HLS Programmers Guide and judiciously apply these optimizations.

If the loops must be unrolled, or if the use of the PIPELINE directive in the hierarchy above has automatically unrolled the loops, consider capturing the loop body as a separate function. This will capture all the logic into one function instead of creating multiple copies of the logic when the loop is unrolled: one set of objects in a defined hierarchy will be scheduled faster. Remember to pipeline this function if the unrolled loop is used in pipelined region.

The degrees of freedom in the code can also impact runtime. Consider the HLS compiler to be an expert designer who by default is given the task of finding the design with the highest throughput, lowest latency and minimum area. The more constrained the tool is, the fewer options it has to explore and the faster it will run. Consider using latency constraints over scopes within the code: loops, functions or regions. Setting a LATENCY directive with the same minimum and maximum values reduces the possible optimization searches within that scope.