Vitis Tutorials: AI Engine Development (XD100) - 2024.2 English - Learn how to target, develop, and deploy advanced algorithms using a Versal AI Engine array in conjunction with PL IP/kernels and software applications running on the embedded processors. - XD100
Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English
Vitis Tutorials: AI Engine Development (XD100)
AI Engine Development on AIE-ML
Feature Tutorials
A to Z Bare-metal Flow
Introduction
Support
Using GMIO with AIE-ML
Introduction
Objectives
Steps
Runtime Parameter Reconfiguration
Introduction
Objectives
Steps
Support
Packet Switching
Objectives
Steps
Support
Versal Integration for Hardware Emulation and Hardware
Introduction
Objectives
Tutorial Overview
Section 1: Compile AI Engine Code for AIE Simulator: Viewing Compilation Results in Vitis Analyzer
Important
Compiling an AI Engine ADF Graph for V++ Flow
Vitis Analyzer Compile Summary
Section 2: Simulate the AI Engine Graph using the aiesimulator and Viewing Trace and Profile Results in Vitis Analyzer
Section 3: Run the Hardware Emulation, and View Run Summary in Vitis Analyzer
1. Compiling HLS Kernels Using v++
2. Use V++ to Link AI Engine, HLS Kernels with the Platform
3.Compile the A72 Host Application
4.Package the Design
5.Run Hardware Emulation
Section 4: Build and Run on Hardware
Summary
Support
AI Engine-ML Performance Analysis Tutorial
Objectives
Target Application Introduction
Steps - Version 1
Steps - Version 2
Steps – Version 3
Steps - Version 4
Conclusion
Support
AIE Compiler Features
Introduction
Objectives
Tutorial Sections
Conditional Objects
Case 1
Case 2
Case 3
Case 4
Multirate
UpConv then DownConv (Buffer)
DownConv then UpConv (Buffer)
Split and Merge (Buffer)
UpConv then DownConv (Stream)
DownConv then UpConv (Stream)
Split and Merge (stream)
Multicast
Case 1: Stream and Buffer Multicasting
Case 2: Multirate Buffer Multicasting
Design Tutorials
AIE-ML Programming
Introduction
Objectives
Prerequisite knowledge
Matrix Multiplication
Taking advantage of AI Engine-ML architecture
Matrix Multiplication modes for real types
Matrix Multiplication modes for complex types
AI Engine-ML code analysis
Running the tutorial
Performance analysis
Conclusion
Support
Prime Factor FFT-1008 on AIE-ML
Prime Factor FFT-1008 on AIE-ML
Introduction
Matlab Models
Design Overview
INPUT PERMUTE Kernel
FFT-7 Kernel
TRANSPOSE-0 Kernel
FFT-9 Kernel
TRANSPOSE-1 Kernel
FFT-16 Kernel
OUTPUT PERMUTE Kernel
Design Resources
Build and Run Design
Setup & Initialization
Hardware Emulation
Hardware
References
Support
License
AIE-ML LeNet Tutorial
Introduction
Tutorial Overview
Before You Begin
Tools: Installing the Tools
Environment: Setting Up the Shell Environment
AIE API based FFT for Many Instances Applications
Introduction
Table of Contents
Objectives
Required Background Knowledge
Considered Case Study
Design Strategy
Designing the FFT Application with the AI Engine ML
Designing the Kernel with the AI Engine API
Understanding the APIs
Coding the Kernel
Twiddles Header File
Kernel Header File
Kernel Source Code
Designing the Graph
Data Movement Design
Designing Data Movement with the Memory Tiles
Using the Shared Buffers
Coding the Graph
Graph Header File
Graph Source Code
Implementing and Evaluating the AIE-ML Design with Vitis Unified IDE
Creating the AI Engine ML Project in Vitis
x86 Simulation and Functional Validation
AI Engine Simulation, Array and Trace Analysis
Optimizing the AIE-ML Design
Graph Optimizations
x86 Simulation and Functional Validation
AI Engine Simulation, Array and Trace analysis
Support
License
Softmax Function on AIE-ML
Introduction
Softmax Function Definition
Computing the Exponential Function
IEEE 754 Format Trick
Improving Accuracy
Adapting for bfloat16 Floating-Point
AI Engine Implementation
AI Engine Kernel Processing
Kernel Data Interface
Kernel Code
Running the Example
Generating Test Vectors
Running x86 Simulation
Running AI Engine Simulation
Analyzing Results
Vitis Analyzer
Test Vector Comparison
References
Support
License
Migrating Farrow Filter from AIE to AIE-ML
Migrating Fractional Delay Farrow Filter from AIE to AIE-ML Architecture
Introduction
Comparison of AIE vs AIE-ML Farrow Filter Design Implementation
Conclusion
Polyphase Channelizer on AIE-ML using Vitis Libraries
Introduction
Channelizer Requirements
System Partitioning
Filterbank System Partitioning
Filterbank Compute Requirements
Filterbank Storage Requirements
Filterbank I/O Bandwidth Requirements
Filterbank Library Characterization
Filterbank Library Optimization
IFFT-2D System Partitioning
IFFT-2D Library Characterization
IFFT-2D Library Optimization
Design Summary
Design Resources
Build and Run Design
Setup & Initialization
Hardware Emulation
Hardware
References
Support
License
MNIST ConvNet on AIE-ML
MNIST ConvNet on AIE-ML
Introduction
Virtual Python Environment Setup
Jupyter Notebook Model
Import the MNIST Image Database
Training & Testing the MNIST ConvNet Model
Using the MNIST ConvNet for Inference
Extracting Weights & Biases for AIE-ML Inference Solution
AIE-ML Inference Solution
Design Approach
Vitis Functional Simulation
MNIST ConvNet: AI Engine Graph View
MNIST ConvNet: AI Engine Floorplan View
MNIST ConvNet: AI Engine Resource Utilization
Vectorization of 3x3 Conv2D Layer Processing
MNIST ConvNet: Profiling & Vector Load
MNIST ConvNet: Throughput
Individual Layer Designs
Layer Design Details: conv2d_w1()
Layer Design Details: max_pooling2d_w2()
Layer Design Details: conv2d_w3()
Layer Design Details: max_pooling2d_w4()
Layer Design Details: conv2d_w5()
Layer Design Details: dense_w7()
Summary
References
Support
License
AI Engine Development on AIE
Feature Tutorials
AI Engine A-to-Z Flow for Linux
Introduction
Methodology Overview
Objectives
Tutorial
Step 1: Run system design targeting the base platform
Step 2: Create a custom platform
Step 3: Run the system design targeting the custom platform
Support
A to Z Bare-metal Flow
Using GMIO with AIE
Introduction
Objectives
Steps
Runtime Parameter Reconfiguration
Introduction
Overview
Steps
Asynchronous Scalar RTP
Asynchronous Array RTP
Asynchronous RTP Read
Synchronous RTP
Summary
Support
Packet Switching
Objectives
Steps
Support
Versal Integration for Hardware Emulation and Hardware
Introduction
Objectives
Tutorial Overview
Section 1: Compile AI Engine Code for AIE Simulator: Viewing Compilation Results in Vitis Analyzer
Compiling an AI Engine ADF Graph for V++ Flow
Vitis Analyzer Compile Summary
Section 2: Simulate the AI Engine Graph using the aiesimulator and Viewing Trace and Profile Results in Vitis Analyzer
Section 3: Run the Hardware Emulation, and View Run Summary in Vitis Analyzer
1. Compiling HLS Kernels Using v++
2. Use V++ to Link AI Engine, HLS Kernels with the Platform
3.Compile the A72 Host Application
4.Package the Design
5.Run Hardware Emulation
Section 4: Build and Run on Hardware
Summary
Support
Versal System Design Clocking
Introduction
Objectives
Step 1 - Building ADF Graph
Step 2 - Clocking the PL Kernels
Step 3 - v++ linker – Building the System
Step 4 - Compiling Host Code
Step 5 - Packaging Design and Running on Board
Challenge (Optional)
Build the design for Hardware Emulation
Summary
Using Floating-Point in the AI Engine
Introduction
AI Engine Architecture Details
Fixed-Point Pipeline
Floating-point Pipeline
Floating-point intrinsics
Start, offset
fpneg, fpabs, fpadd, fpsub
fpneg
fpabs
fpneg_abs
fpadd, fpsub
fpadd_abs, fpsub_abs
fpmul
fpabs_mul
fpneg_mul
fpneg_abs_mul
fpmac, fpmsc, fpmac_abs, fpmsc_abs
fpmul_conf, fpmac_conf
Floating-Point Examples
FIR Filter
Real Floating-Point Filter
Complex Floating-Point Filter
Matrix Multiply
Support
DSP Library Tutorial
Introduction
Part 1: Creating a Single Kernel Graph
Understanding the Source Files
Compile the application
Running the Design through Simulation
Using Vitis Analyzer to look at the Simulation Results
Part 2: Creating a Multi Kernel Graph
Changes to the Filter Graph from Part 1
Build AI Engine Emulation
Running the Design through Simulation
Using Vitis Analyzer to look at the Compilation and Simulation Results
Part 3: Optimizing Filter Performance
Changes to the Filter Graph from Part 1
Build AI Engine Emulation
Running the Design through Simulation
Using Vitis Analyzer to look at the Compilation and Simulation Results
Conclusion
Debug Walkthrough
Introduction
Example Design: Peak Detector
Vitis IDE Project
Methods
Methods
Debug Methodologies
Best Practices
Support
AI Engine DSP Library and Model Composer Tutorial
Introduction
Before You Begin
Overview
Stage 1: Create and Simulate the Design
Stage 2: Further Analysis of the Design
Stage 3: Generate the Code and Perform Emulation-AI Engine
Stage 4: Increasing the PLIO Bitwidth and Re-generate
Conclusion
Versal Emulation Waveform Analysis
Introduction
Objectives
Tutorial Overview
Design Overview
Transaction Level Modeling
Steps
Step 1: Build Design
Step 2: Launching Emulation with XSIM Waveform GUI
Step 3: Using XSIM Waveform GUI and QEMU
Exploring the Waveforms
Checking Proper Boot-up Using PMC
Transactions Generated by PS (QEMU) to PL/AIE
PL to AI Engine
AI Engine RTP Signals
AI Engine to PL to DDR Memory
Limitations
Step 4: Using Vitis Analyzer
Summary
AI Engine Performance and Deadlock Analysis Tutorial
Introduction
Before You Begin
Objectives
Steps
Support
Implementing an IIR Filter on the AI Engine
Support
Post-Link Recompile of an AI Engine Application
Introduction
AI Engine Application Post-Link Recompile
Objectives
License
Support
Python and C++ External Traffic Generators for AI Engine Simulation and Emulation Flows
Introduction
Objectives
Before You Begin
Design Overview
Analyzing the full system design in the external traffic generators
In the Emulation Flow
In the AI Engine simulation flow
Support
Using RTL IP with AI Engines
Introduction
Objectives
Tutorial Overview
Step 1 - Creating custom RTL kernels with the Vivado Design Suite
Step 2 - Creating HLS kernels with Vitis compiler
Step 3 - Interfacing ADF graph to Programmable Logic
Step 4 - Building XCLBIN
Step 5 - Build Host Application
Step 6 - Package
Step 7 - Run Emulation
To View Emulation Waveforms
Summary
AIE Compiler Features
Introduction
Introduction
Objectives
Tutorial Sections
Support
Two Tone Filter on AIE Using DSP libraries and Vitis Model Composer
Signal Processing on AI Engine Using Vitis DSP Libraries and Vitis Model Composer
Introduction
Before You Begin
Overview
Conclusion
AIE PL Interface
Introduction
Part 1 - Connecting RTL AXI4-Stream Interfaces (included in Block Design) to the AI Engine
Platform
Hardware Platform creation
Vitis V++ Link
Hardware Emulation
Part 2 - Connecting RTL AXI4-Stream interfaces (NOT included in Block Design) to the AI Engine
Hardware Platform
Vitis V++ Link
Hardware Emulation
README
Part 3 - Connecting Monitored RTL Interfaces to AI Engine
Creating the design
Running the Design in Hardware
Part 4 - Broadcasting Data to the AI Engine and the Programmable
Creating the design
Hardware Emulation
Design Tutorials
Versal Custom Thin Platform Extensible System
Getting Started
Build-flow
Build-flow Dependencies
Build & Prerequisites
Custom Thin Base Platform
More In-Depth
Testing
Running on a board
Running Hardware Emulation
Execution & Results
Notes
Design Considerations
References
AI Engine Documentation
Xilinx® Runtime (XRT) Architecture
2024.2 Vitis Unified Software Development Platform Documentation Landing Page
Revision History
LeNet Tutorial
Introduction
Tutorial Overview
Before You Begin
Tools: Installing the Tools
Environment: Setting Up the Shell Environment
Super Sampling Rate FIR Filters
Introduction
Before You Begin
Accessing the Tutorial Reference Files
SSR FIR Tutorial
Summary of AI Engine Architecture
Memory interface
Streaming interface
Cascade Streams
What is a FIR Filter?
“Utils” Directory
GenerateStreams
ProcessAIEOutput
StreamThroughput
GetDeclare.sh
Support
Beamforming Design
Introduction
Tutorial Overview
Assumptions
Before You Begin
Documentation: Explore AI Engine Architecture
Tools: Installing the Tools
Environment: Setting Up Your Shell Environment
Validation: Confirming Tool Installation
Other Tutorials: Learn Basic V++ and AI Engine Concepts
System Design Overview
Block Diagram
Modules
Module 01 - Custom Platform
Module 02 - AI Engine Design
Module 03 - PL Design
Module 04 - AI Engine and PL Integration
Module 05 - Bare-Metal PS Host Application
Module 06 - System Integration - Bare Metal
Module 07 - PetaLinux
Module 08 - Linux SW Application
Module 09 - System Integration - Linux
Beamforming Introduction
Downlink Beamforming
Downlink Beamforming Formulation
Example Equations for a Single Subcarrier
Generalized Downlink Beamforming Equations
Uplink Beamforming
Generalized Uplink Beamforming Equations
Support
Polyphase Channelizer
Introduction
Channelizer Requirements
MATLAB Model
System Partitioning
Clock Rate and SSR Planning
Circular Buffer
Polyphase Filterbank
Cyclic Shift Buffer
IDFT
Design Overview
Polyphase Filterbank Design
Discrete Fourier Transform Design
Build and Run Design
Setup & Initialization
Hardware Emulation
Hardware
Estimating Power Using the Power Design Manager
Step 1: Building the Design for VCK190 and Executing Power Targets
Step 2: Creating a New Project
Step 3: Refining the AI Engine Power Estimate Using Simulated Design and Switching Activities
References
Support
Prime Factor FFT
Introduction
Matlab Models
I/O Permutations (2D Case)
I/O Permutations (3D Case)
Design Overview
INPUT PERMUTE Kernel
FFT-7 Kernel
TRANSPOSE1 Kernel
FFT-9 Kernel
TRANSPOSE2 Kernel
FFT-16 Kernel
OUTPUT PERMUTE Kernel
Design Resources
Build and Run Design
Setup & Initialization
Hardware Emulation
Hardware
References
Support
2D-FFT
Introduction
Design Overview
Directory Structure
Before You Begin
Installing the Tools
Platform
Setting up the Environment
Confirming Tool Installation
Design Implementations
AI Engine and HLS Implementation Comparison
References
AI Engine Documentation
Vitis DSP Libraries
Xilinx Runtime (XRT) Architecture
Vitis Unified Software Development Platform Documentation (https://docs.amd.com/v/u/en-US/ug1416-vitis-documentation)
Known Issues
Support
FIR Filter
Introduction
Overview
Directory Structure
Before You Begin
Tools: Installing the Tools
Environment: Setting Up the Shell Environment
Validation: Confirming Tool Installation
Design Implementations
Choosing between AI Engine and HLS Implementations
Resource Utilization
Power Utilization
Computational Efficiency
AI Engine Specific Design Considerations
Window Size
N-Body Simulator
Introduction
Before You Begin
Documentation: Explore AI Engine Architecture
Tools: Installing the Tools
Environment: Setting Up Your Shell Environment
Validation: Confirming Tool Installation
Goals of this Tutorial
HPC Applications
A similar accelerator example was implemented on the AMD UltraScale+™-based Ultra96 device using only PL resources here.
PL Data-Mover Kernels
The N-Body Problem
12,800 Particles simulated on a 400 tile AI Engine accelerator for 300 timesteps
Newton’s Second Law of Motion
Gravity Equations - Two Bodies
Gravity Equations - N Bodies
System Design Overview
Dataflow
Where We’re Headed …
Module 01 - Python Simulations on x86
Module 02 - AI Engine Design
Module 03 - PL Kernels
Module 04 - Full System Design
Module 05 - Host Software
Module 06 - SD Card and Hardware Run
Module 07 - Results
(Optional) x1_design and x10_design
Build Flows
For more advanced users
For more novice users
A Word about Makefiles
Building for VCK190 ES1 Board
References
Next Steps
Support
Digital Down-conversion Chain: Converting from Intrinsics to API
Table of Contents
Introduction
Upgrading Tools, Device Speed Grade, and Makefile
Upgrading the Code
Converting Kernel Functions to Kernel Classes
Migrating from Windows to Buffers
Replacing Intrinsics with APIs
Relocating Global Variables to Kernel Class Data Members
Handling State Variables to Enable x86sim
Updating Older Pragmas
Supporting x86 Compilation and Simulation
Building and Running the Design
Setup and Initialization
x86 Functional Simulation
Hardware Simulation
Summary
Support
License
Versal GeMM Implementation
Versal GeMM Implementation Using Vitis Acceleration Library and DSP58 Tutorial
Introduction
Design Overview
AIE
DSP
Directory Structure
Before You Begin
Installing the Tools
Platform
Setting up the Environment
Confirming Tool Installation
Design Implementations
AI Engine and DSP Implementation Comparison
References
Vitis Unified Software Development Platform Documentation
Vitis DSP Libraries
Xilinx Runtime (XRT) Architecture
Vitis Unified Software Development Platform 2024.2 Documentation
Known Issues
Support
Bilinear Interpolation
Introduction
Computing Interpolated Values
Design Assumptions
AI Engine Code Vectorization
Data Interface
Programmable Logic Component
PLIO Interface
AI Engine Test Vectors
AI Engine Kernel Processing
Kernel Data Interface
Kernel Code
Running the Example
Generating Test Vectors
Running x86 Simulation
Running AI Engine Simulation
Analyzing Results
Vitis Analyzer
Test Vector Comparison
Customizing the Example
Specifying a Test Image and Output Resolution
Multicore Processing
References
Support
64K IFFT Using 2D Architecture
Introduction
Matlab Model
Design Overview
Design Approach
IFFT-256 Prototyping
Front-End IFFT-256 AI Engine Kernel
Memory Transpose PL Kernel
Back-End IFFT-256 AI Engine Kernel
Design Resources
Build and Run Design
Setup & Initialization
Hardware Emulation
Hardware
References
Support
Implementing FFT and DFT Designs on AI Engines
Abstract
References
Support
Bitonic SIMD Sorting on AI Engine for float Datatypes
Introduction
Small Bitonic Sorting Example
Stage 0
Stage 1
Stage 2
Stage 3
Profiling of \(N=16\) Bitonic Sort vs. std::sort()
Large Bitonic Sorting Example
Profiling of \(N=1024\) Bitonic Sort vs. std::sort()
References
Support
Fractional Delay Farrow Filter
Introduction
Requirements and System Partitioning
Compute Analysis
Bandwidth Analysis
Storage Analysis
AI Engine Implementation and Optimization
Initial Farrow Design
First Farrow Optimization
Second Farrow Optimization
Final Farrow Optimization
Build and Run Design
Setup and Initialization
Hardware Emulation
Hardware
Summary and Conclusion
References
Support
1 Million Point float FFT @ 32 Gsps on AI Engine
Introduction
Matlab Models
Design Overview
AI Engine Graph View
AI Engine Array View
VC1902 Floorplan View
AI Engine Design Validation
VC1902 Timing Closure
Design Resources
Build and Run Design
Setup & Initialization
Hardware
References
Support
System Partitioning of a Hough Transform on AI Engine
System Partitioning of a Hough Transform on AI Engine
Introduction
What is the Hough Transform?
What is System Partitioning?
System Partitioning Methodology
Hough Transform Matlab Model
System Partitioning
Goals
Parallelizing Over “Image Tiles”
Parallelizing Over “Theta”
Analyzing Storage Requirements
Analyzing Compute Requirements
Analyzing I/O Bandwidth Requirements
SIMD / Vectorization
Solution Synthesis
Partitioning Validation
Iterating to System Feasibility
Conclusions
References
Support
License
MUSIC Algorithm
MUltiple SIgnal Classification (MUSIC) Algorithm on AI Engine
Introduction
System Model
Subspace Algorithm
MUSIC Spectrum Estimation
MATLAB Model
AI Engine Subgraph Designs
IO Adapter Subgraph
QRD Subgraph
SVD Subgraph
DOA Subgraph
Scanner Subgraph
Finder Subgraph
Top-Level Design
Building the Design
Setup and Initialization
Hardware Emulation
Hardware
Hardware-in-the-Loop Demo
Architecture
System Operation
Performance Estimation
Software Version
MATLAB Folder Structure
Steps to Generate and Run HIL Demo Data
Archiving Demo Data
Playback Videos
Client and Server on MATLAB
Conclusions
References
Appendix
Deploying the SD Card Image
Booting the VCK190 Board
Simple Ethernet Configuration
Using a VPN
Running the PS Application
Testing with MATLAB
Support
License
Softmax Function
Introduction
Softmax Function Definition
Computing the Exponential Function
IEEE 754 Format Trick
Improving Accuracy
Adapting for Single-Precision Floating-Point
AI Engine Implementation
AI Engine Kernel Processing
Kernel Data Interface
Kernel Code
Running the Example
Generating Test Vectors
Running x86 Simulation
Running AI Engine Simulation
Analyzing Results
Vitis Analyzer
Test Vector Comparison
References
Support
License