Tutorial Overview - 2024.1 English - XD261

Vitis Tutorials: Vitis HLS (XD261)

Document ID
XD261
Release Date
2024-06-19
Version
2024.1 English

This tutorial demonstrates the micro-optimization techniques used to increase performance for Vitis HLS designs. The fundamental HLS pragmas used for micro-optimization are PIPELINE, UNROLL, and ARRAY_RESHAPE or ARRAY_PARTITION. This tutorial walks through the HLS analysis tools which can be used to guide the optimization process by highlighting inefficiencies and recommending pragmas.

The tutorial is based on the Beamformer IP that is discussed in more depth in the Design Tutorials section of the Vitis HLS Tutorials.

Beamformer Diagram

That tutorial goes into more depth on the algorithm itself, whereas this tutorial will go into more depth on analysis tools and techniques used for optimization. As such, this tutorial simplifies the application by focusing on just the compute engine portion of the beamformer algorithm. The code is provided in the reference files directory:

  • beamformer.cpp: C code for Beamformer design

  • beamformer.h: Header file for Beamformer design

  • beamformer_tb.cpp: Testbench

  • result.golden_float.dat: Testbench simulation results used for design verification

After completing this tutorial, you will be able to:

  • Build a baseline sequential version of your design that will act as a non-optimized reference point.

  • Analyze the effects of applying the PIPELINE pragma to the outer loop of a nested loop structure and observe the automatic UNROLL that occurs to the inner loop(s).

  • Use the Vitis HLS analysis views as guidance for the ‘next step’ required to improve performance.

  • Apply the ARRAY_RESHAPE and/or ARRAY_PARTITION pragmas to improve data movement bottlenecks in your design.

  • Target HLS IP to different AMD™ FPGAs and adaptive SoCs.

This tutorial will implement the compute engine portion of the beamformer with 16 RX channels, each of which has both a real and an imaginary component. In addition, the beamformer will have 3 beams. The design will be optimized to meet a performance specification and synthesized for use on a Versal™ Premium series adaptive SoC before being migrated to a Zynq™ Ultrascale+™ RFSoC device for comparison.