Digital Down-conversion Chain: Converting from Intrinsics to API - 2024.1 English

Vitis Tutorials: AI Engine

Document ID
XD100
Release Date
2024-06-19
Version
2024.1 English

Version: Vitis 2024.1

Table of Contents

Introduction

Versalâ„¢ adaptive SoCs combine programmable logic (PL), processing system (PS), and AI Engines with leading-edge memory and interfacing technologies to deliver powerful heterogeneous acceleration for any application. The hardware and software are targeted for programming and optimization by data scientists and software and hardware developers. A host of tools, software, libraries, IP, middleware, and frameworks enable Versal adaptive SoCs to support all industry-standard design flows.

This tutorial demonstrates the steps to upgrade a 32-branch digital down-conversion chain so that it is compliant with the latest tools and coding practice. Examples for the following changes with side-by-side view of the original and upgraded code are included in the tutorial.

  • Converting coding style from kernel functions to kernel C++ classes

  • Relocating global variables to kernel class data members

  • Handling state variables to enable x86sim

  • Migrating Windows (deprecated) to buffers for non-stream based kernel I/O

  • Replacing kernel intrinsics with equivalent AI Engine APIs

  • Updating older pragmas

  • Supporting x86 compilation and simulation

You can find the design description in the Digital Down-conversion Chain Implementation on AI Engine (XAPP1351). The codebase associated with the original design can be found in the Reference Design Files.

Upgrading Tools, Device Speed Grade, and Makefile

Note: Simply loading the latest version of the tools and compiling the design is not possible because the baseline Makefile has deprecated compiler options.

figure1

Important changes to the Makefile are listed below:

  • Upgrade part speed grade xcvc1902-vsva2197-1LP-e-S-es1 (previously specified by --device) to xcvc1902-vsva2197-2MP-e-S (specified by --platform). As can be seen in the following table (referenced from Versal AI Core Series Data Sheet: DC and AC Switching Characteristics (DS957)), this increases the AI Engine clock frequency from 1 GHz to 1.25 GHz.

    figure2

    Recompiling and simulating the design with this change causes the throughput to increase by around 17-25%.

  • Upgrade to use v++ unified compiler command.

  • Add support for x86 compilation and simulation.

Upgrading the Code

Converting Kernel Functions to Kernel Classes

Functionality included in the init() function is migrated to the new kernel C++ class constructor. The main kernel function wrapper is migrated to a new class run() member function.

figure3

Create a header file for the class. You are required to write the static void registerKernelClass() method in the header file. Inside the registerKernelClass() method, call the REGISTER_FUNCTION macro. This macro is used to register the class run method to be executed on the AI Engine core to perform the kernel functionality.

figure4

When creating the kernel in the upper graph or subgraph, use kernel::create_object instead of kernel::create. Remove initialization_function as it is now part of class constructor.

figure5

Migrating from Windows to Buffers

Windows I/O connections between kernels were deprecated in the 2023.2 release of the AMD Vitisâ„¢ software platform. The AI Engine Kernel and Graph Programming Guide (UG1079) describes how the source code of a design should change to upgrade it to buffer I/Os. The following figures show the steps required (repeated for every kernel) to upgrade I/O connections from Windows to buffers.

  1. Make the changes shown in the following figure in the kernel.cc file:

    figure6

  2. If the design uses classes, upgrade the associated header file.

    figure7

  3. In the graph file, modify the connection type and specify dimension. Note division by 4 to convert from bytes to samples.

    figure8

Replacing Intrinsics with APIs

The following example shows a side-by-side comparison of intrinsic-based code compared to API-based code. Both are functionally equivalent and produce the same final hardware usage and throughput.

figure9

Relocating Global Variables to Kernel Class Data Members