Optimizing Device Resources - 2020.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2021-03-22
Version
2020.2 English

Data Width

One, if not the most important aspect for performance is the data width required for the implementation. The tool propagates port widths throughout the algorithm. In some cases, especially when starting out with an algorithmic description, the C/C++/OpenCL code might only use large data types such as integers even at the ports of the design. However, as the algorithm is mapped to a fully configurable implementation, smaller data types such as 10-/12-bit might often suffice. It is beneficial to check the size of basic operations in the HLS Synthesis report during optimization.

In general, when the Vitis core development kit maps an algorithm onto the FPGA, more processing is required to comprehend the C/C++/OpenCL API structure and extract operational dependencies. Therefore, to perform this mapping the Vitis core development kit generally partitions the source code into operational units which are then mapped onto the FPGA. Several aspects influence the number and size of these operational units (ops) as seen by the tool.

In the following figure, the basic operations and their bit-width are reported.

Figure 1. Operations Utilization Estimates

Look for bit widths of 16, 32, and 64 bits commonly used in algorithmic descriptions and verify that the associated operation from the C/C++/OpenCL API source actually requires the bit width to be this large. This can considerably improve the implementation of the algorithm, as smaller operations require less computation time.

Fixed Point Arithmetic

Some applications use floating point computation only because they are optimized for other hardware architecture. Using fixed point arithmetic for applications like deep learning can save the power efficiency and area significantly while keeping the same level of accuracy.

Recommended: Xilinx recommends exploring fixed point arithmetic for your application before committing to using floating point operations.

Macro Operations

It is sometimes advantageous to think about larger computational elements. The tool will operate on the source code independently of the remaining source code, effectively mapping the algorithm without consideration of surrounding operations onto the FPGA. When applied, the Vitis technology keeps operational boundaries, effectively creating macro operations for specific code. This uses the following principles:

  • Operational locality to the mapping process
  • Reduction in complexity for the heuristics

This might create vastly different results when applied. In C/C++, macro operations are created with the help of #pragma HLS inline off. While in the OpenCL API, the same kind of macro operation can be generated by not specifying the following attribute when defining a function:

__attribute__((always_inline))

For more information, see pragma HLS inline .

Using Optimized Libraries

The OpenCL specification provides many math built-in functions. All math built-in functions with the native_ prefix are mapped to one or more native device instructions and will typically have better performance compared to the corresponding functions (without the native_ prefix). The accuracy and in some cases the input ranges of these functions is implementation-defined. In the Vitis technology, these native_ built-in functions use the equivalent functions in the Vitis HLS tool Math library, which are already optimized for Xilinx FPGAs in terms of area and performance.

Recommended: Xilinx recommends using native_ built-in functions or the HLS tool Math library if the accuracy meets the application requirement.