Design Tutorials - 2024.1 English

Vitis Tutorials: AI Engine

Document ID
XD100
Release Date
2024-06-19
Version
2024.1 English

The AI Engine Development Design Tutorials showcase the two major phases of AI Engine application development: architecting the application and developing the kernels. Both phases are demonstrated in these tutorials.

The landing page of AI Engine Development contains important information including tool version, environment settings, and a table describing the platform, operating system, and supported features or flows of each tutorial. It is strongly recommended that you review details before starting to use the AIE tutorials.

Tutorial

Description

Versal Custom Thin Platform Extensible System

This is a Versal system example design based on a VCK190 thin custom platform (Minimal clocks and AXI exposed to PL) that includes HLS/RTL kernels and AI Engine kernel using a full Makefile build-flow.

LeNet Tutorial

This tutorial uses the LeNet algorithm to implement a system-level design to perform image classification using the AI Engine and PL logic, including block RAM (BRAM). The design demonstrates functional partitioning between the AI Engine and PL. It also highlights memory partitioning and hierarchy among DDR memory, PL (BRAM) and AI Engine memory.

Super Sampling Rate FIR Filters

The purpose of this tutorial is to provide a methodology to enable you to make appropriate choices depending on the filter characteristics, and to provide examples on how to implement Super Sampling Rate (SSR) FIR Filters on a Versal® ACAP AI Engine processor array.

Beamforming Design

This tutorial demonstrates the creation of a beamforming system running on the AI Engine, PL, and PS, and the validation of the design running on this heterogeneous domain.

Polyphase Channelizer

This tutorial demonstrates an implementation of a system-level design (such as Polyphase Channelizer) using a combination of AI Engine and PL/HLS kernels.

Prime Factor FFT-1008

This Versal system example implements a 1008-pt FFT using the Prime Factor Algorithm. The design uses both AI Engine and PL kernels working cooperatively. AI Engine elements are hand-coded using AIE API. PL elements are implemented using Vitis HLS. System integration in Vitis is managed using the new v++ Unified Command Line flow.

2D-FFT

This tutorial performs two implementations of a system-level design (2D-FFT): one with AI Engine, and the other with HLS using the DSP Engines.

FIR Filter

This tutorial demonstrates the implementations of a system-level design (FIR Filter) using AI Engines and HLS with DSP Engines in the Versal device plus PL including LUTs, flip-flops (FFs), and block RAMs.

N-Body Simulator

It is a system-level design that uses the AI Engine, PL, and PS resources to showcase the following features:

  • A Python model of an N-Body Simulator run on x86 machine

  • A scalable AI Engine design that can utilize up to 400 AI Engine tiles

  • AI Engine packet switching

  • AI Engine single-precision floating point calculations

  • AI Engine 1:400 broadcast streams

  • Codeless PL HLS datamover kernels from the Vitis™ Utility Library

  • PL HLS packet switching kernels

  • PS Host Application that validates the data coming out of the AI Engine design

  • C++ model of an N-Body Simulator

  • Performance comparisons between Python x86, C++ Arm A72, and AI Engine N-Body Simulators

  • Effective throughput calculation (GFLOPS) vs. Theoretical peak throughput of AI Engine

Digital Down-conversion Chain - Converting from Intrinsics to API

This tutorial demonstrates the steps to upgrade a 32-branch digital down-conversion chain so that it is compliant with the latest tools and coding practice.

Versal GeMM Implementation

This tutorial performs two implementations of a system-level design: one with AI Engine, and the other with RTL using the DSP Engines. In each implementation, the tutorial takes you through the hardware emulation and hardware flow in the context of a complete Versal ACAP system design.

Bilinear Interpolation

This tutorial demonstrates how the bilinear interpolation algorithm may be efficiently implemented using AI Engines. It also provides guidance for customizing the design to function with varying image resolutions, and to take advantage of multicore processing on the AI Engine array to achieve desired throughput.

64K IFFT Using 2D Architecture

This Versal system example implements a 64K-pt IFFT using a 2D architecture. We decompose 64K = 256 x 256 and build the transform in two dimensions using row and column FFT-256. A matrix transpose is performed in between in the PL. This alternative “divide and conquer” approach is attractive in the SSR > 1 regime.

Implementing FFT and DFT Designs on AI Engines

This tutorial illustrates several techniques for mapping FFT and DFT algorithms to the AI Engine array including the Stockham FFT used in AMD Vitis DSPlib, hand-coded variants implemented using the AI Engine API, and a direct form DFT using vector-matrix multiplication. We also illustrate how to trade off AI engine tile resource vs. throughput performance of the Stockham FFT in DSPlib using its TP_CASC_LEN and TP_PARALLEL_POWER template parameters. This is useful when configuring DSPlib FFT library instances to serve as part of a larger 2D FFT architecture.

Bitonic SIMD Sorting on AI Engine for float Datatypes

This tutorial illustrates how to implement a Bitonic SIMD sorter on AI Engine in Versal for float data types. Two examples are given. First, a small example using N=16 demonstrates the concept and identifies strategies for vectorization & management of the vector register space. These ideas are then applied to a second larger example using N=1024. Profiling & throughput performance are compared to std::sort().

Fractional Delay Farrow Filter

This Versal system example implements a variable fractional delay algorithm using the Farrow Filter structure and walks the user through common AI Engine design optimization techniques. The design uses both AI Engine and PL kernels working cooperatively. AI Engine elements are hand-coded using AIE API. PL elements are implemented using Vitis HLS. System integration in Vitis is managed using the new v++ Unified Command Line flow.

1 Million Point float FFT @ 32 Gsps on AI Engine

This tutorial implements a 1M-point FFT for cfloat data types that achieves an impressive throughput rate exceeding 32 Gsps using a large portion of the AI Engine array for compute and PL URAM resources to implement a matrix transpose operation.

System Partitioning of a Hough Transform on AI Engine

This tutorial walks through the process of planning the implementation of a well-known image processing algorithm, mapping and partitioning it to the resources available in a Versal Adaptive SoC device. We illustrate this using the Hough Transform—a feature extraction technique for computer vision and image processing.