Vitis Tutorials: AI Engine Development (XD100) - 2025.1 English - Learn how to target, develop, and deploy advanced algorithms using a Versal AI Engine array in conjunction with PL IP/kernels and software applications running on the embedded processors. - XD100
Document ID
XD100
Release Date
2025-06-20
Version
2025.1 English
AI_Engine_Development
AIE
AIE README
Introduction
Getting Started
AI Engine Documentation
AI Engine Training
Environment Settings
Getting Started with AI Engine Development Using the AI Engine Tutorials
AI Engine Application Development
AI Engine Application Debug and Optimization
System Integration
Available Tutorials
Feature Tutorials
Design Tutorials
Design Tutorials
Design Tutorials README
AIE Lenet Tutorial
Introduction
Tutorial Overview
Before You Begin
Tools: Installing the Tools
Environment: Setting Up the Shell Environment
Super Sampling Rate FIR
Dual SSR 16 HW
Goal of this hardware implementation
Compile the graph
Build hardware and generate sd_card.img
Logging in for the first time
Support
Dual Stream SSR
Dual-Stream Input Impact
Designing the Graph
C++ Code Analysis
Compilation and Analysis
Support
Multi Kernel
Designing the Kernel
C++ Code Analysis
Data and Coefficients Management and Operation Scheduling
Compilation and Analysis
Support
Single Kernel
Filter Description
Designing the Kernel
Interfaces
Data and Coefficients Management
Coefficients and Data Update Scheduling
Compilation and Analysis
Vitis Analyzer
Script Utils
Support
Single Stream SSR
Super Sampling Rate FIR Filter
Super Sampling Rate and Polyphase
Organize Computation for a 2.5 Gsps Data Stream in 2 Phases
Designing the Graph
C++ Code Analysis
Compilation and Analysis
Support
Beamforming
Module 01 - Custom Platform
License
SPDX Identifiers
Other Projects
Module 02 - AI Engine Design
Options Table
Dependencies
Build Products
Introduction
AI Engine Kernels, Graphs, and Applications
AI Engine Kernels and Graphs
Cascading Chain Subgraph
Downlink Subgraph
Uplink Subgraph
Test Beamforming Graph
AI Engine Application
Sending Data to the Beamforming Kernels
AI Engine Kernels Parameters
AI Engine Subgraph Window Connections
AI Engine Application Data Files
Simulating the AI Engine Graph Application
Run-Time Event API for Performance Profiling
Conclusion
References
Support
Module 03 - PL Design
DLBF Slave Sim Readme
Block diagram of test design
Resources
Module 04 - AI Engine and PL Integration
Building the Design
Build XCLBIN from Scratch
Options
Dependencies
Build Products
Introduction: Linking the System
Timing Summary
REV0: vck190_v1_0_wrapper_timing_summary_routed.rpt
REV1: vck190_v1_0_wrapper_timing_summary_routed.rpt
REV0 Configuration File (config.ini)
[connectivity] Section
Number of Kernels
Streaming Connections
[clock] Section
[advanced] Section
New XSA Platform: rev0
Timing Closure
Timing Closure Strategy
REV1: Configuration File (config_2regslice.ini)
[connectivity] Section
[clock] Section
[vivado] Section
New XSA Platform: rev1
References
Support
Module 05 - Baremetal Host Application
Introduction: Building a Bare-Metal System
Building the Design
Difference between main_partial.cpp and main_full.cpp
Generating the Platform
Compiling the PS Application Source Code
Linking the PS Application Source Code
Bare-Metal Source Code
PS Host Application
Main Function
test_dlbf/test_ulbf Functions
Reset
Configuration
Check RAM
Start
Wait for Done: Inputs
Wait for Done: Outputs
Verify Output
Test ULBF
References
Support
Module 06 - Running the Baremetal System
Building the Design: Hardware Emulation
Dependencies
Build Products
Running the System: Hardware Emulation
Building the Design: Hardware
Dependencies
Build Products
Running the System: Hardware
References
Support
Module 07 - Petalinux
Src Boot README
Module 08 - Linux SW Application
Introduction: Programming the PS Host Application
Execution Flow Chart
Bind UIO Drivers with PL Kernels
Changes from 2025.1
Load AIE XCLBIN
Reset AI Engine
Load AI Engine with XCLBIN
Reset AI Engine in the Middle of Execution
Command-Line Arguments
Support
Module 09 - Running the Linux System
Running the System
Support
Polyphase Channelizer
Introduction
Channelizer Requirements
MATLAB Model
System Partitioning
Clock Rate and SSR Planning
Circular Buffer
Polyphase Filterbank
Cyclic Shift Buffer
IDFT
Design Overview
Polyphase Filterbank Design
Discrete Fourier Transform Design
Build and Run Design
Setup & Initialization
Hardware Emulation
Hardware
Estimating Power Using the Power Design Manager
Step 1: Building the Design for VCK190 and Executing Power Targets
Step 2: Creating a New Project
Step 3: Refining the AI Engine Power Estimate Using Simulated Design and Switching Activities
References
Support
Prime Factor FFT
Introduction
Matlab Models
I/O Permutations (2D Case)
I/O Permutations (3D Case)
Design Overview
INPUT PERMUTE Kernel
FFT-7 Kernel
TRANSPOSE1 Kernel
FFT-9 Kernel
TRANSPOSE2 Kernel
FFT-16 Kernel
OUTPUT PERMUTE Kernel
Design Resources
Build and Run Design
Setup & Initialization
Hardware Emulation
Hardware
References
Support
2D FFT AIE vs HLS
Introduction
Design Overview
Directory Structure
Before You Begin
Installing the Tools
Platform
Setting up the Environment
Confirming Tool Installation
Design Implementations
AI Engine and HLS Implementation Comparison
References
AI Engine Documentation
Vitis DSP Libraries
Xilinx Runtime (XRT) Architecture
Vitis Unified Software Development Platform Documentation (https://docs.amd.com/v/u/en-US/ug1416-vitis-documentation)
Known Issues
Support
FIR Filter AIE vs HLS
Introduction
Overview
Directory Structure
Before You Begin
Tools: Installing the Tools
Environment: Setting Up the Shell Environment
Validation: Confirming Tool Installation
Design Implementations
Choosing between AI Engine and HLS Implementations
Resource Utilization
Power Utilization
Computational Efficiency
AI Engine Specific Design Considerations
Window Size
N Body Simulator
Introduction
Before You Begin
Documentation: Explore AI Engine Architecture
Tools: Installing the Tools
Environment: Setting Up Your Shell Environment
Validation: Confirming Tool Installation
Goals of this Tutorial
HPC Applications
PL Data-Mover Kernels
The N-Body Problem
12,800 Particles simulated on a 400 tile AI Engine accelerator for 300 timesteps
Newton’s Second Law of Motion
Gravity Equations - Two Bodies
Gravity Equations - N Bodies
System Design Overview
Dataflow
Where We’re Headed …
Module 01 - Python Simulations on x86
Module 02 - AI Engine Design
Module 03 - PL Kernels
Module 04 - Full System Design
Module 05 - Host Software
Module 06 - SD Card and Hardware Run
Module 07 - Results
(Optional) x1_design and x10_design
Build Flows
For more advanced users
For more novice users
A Word about Makefiles
Building for VCK190 ES1 Board
References
Next Steps
Support
DDC Chain
Table of Contents
Introduction
Upgrading Tools, Device Speed Grade, and Makefile
Upgrading the Code
Converting Kernel Functions to Kernel Classes
Migrating from Windows to Buffers
Replacing Intrinsics with APIs
Relocating Global Variables to Kernel Class Data Members
Handling State Variables to Enable x86sim
Updating Older Pragmas
Supporting x86 Compilation and Simulation
Building and Running the Design
Setup and Initialization
x86 Functional Simulation
Hardware Simulation
Summary
Support
License
GeMM AIE vs DSP
Versal GeMM Implementation Using Vitis Acceleration Library and DSP58 Tutorial
Introduction
Design Overview
AIE
DSP
Directory Structure
Before You Begin
Installing the Tools
Platform
Setting up the Environment
Confirming Tool Installation
Design Implementations
AI Engine and DSP Implementation Comparison
References
Vitis Unified Software Development Platform Documentation
Vitis DSP Libraries
Xilinx Runtime (XRT) Architecture
Vitis Unified Software Development Platform 2025.1 Documentation
Known Issues
Support
Bilinear Interpolation
Introduction
Computing Interpolated Values
Design Assumptions
AI Engine Code Vectorization
Data Interface
Programmable Logic Component
PLIO Interface
AI Engine Test Vectors
AI Engine Kernel Processing
Kernel Data Interface
Kernel Code
Running the Example
Generating Test Vectors
Running x86 Simulation
Running AI Engine Simulation
Analyzing Results
Vitis Analyzer
Test Vector Comparison
Customizing the Example
Specifying a Test Image and Output Resolution
Multicore Processing
References
Support
2D IFFT 64K
Introduction
Matlab Model
Design Overview
Design Approach
IFFT-256 Prototyping
Front-End IFFT-256 AI Engine Kernel
Memory Transpose PL Kernel
Back-End IFFT-256 AI Engine Kernel
Design Resources
Build and Run Design
Setup & Initialization
Hardware Emulation
Hardware
References
Support
FFT DFT on AIE
Abstract
References
Support
Bitonic Sorting
Introduction
Small Bitonic Sorting Example
Stage 0
Stage 1
Stage 2
Stage 3
Profiling of \(N=16\) Bitonic Sort vs. std::sort()
Large Bitonic Sorting Example
Profiling of \(N=1024\) Bitonic Sort vs. std::sort()
References
Support
Farrow Filter
Introduction
Requirements and System Partitioning
Compute Analysis
Bandwidth Analysis
Storage Analysis
AI Engine Implementation and Optimization
Initial Farrow Design
First Farrow Optimization
Second Farrow Optimization
Final Farrow Optimization
Build and Run Design
Setup and Initialization
Hardware Emulation
Hardware
Summary and Conclusion
References
Support
1M Point FFT 32Gsps
Introduction
Matlab Models
Design Overview
AI Engine Graph View
AI Engine Array View
VC1902 Floorplan View
AI Engine Design Validation
VC1902 Timing Closure
Design Resources
Build and Run Design
Setup & Initialization
Hardware
References
Support
Hough Transform
Introduction
What is the Hough Transform?
What is System Partitioning?
System Partitioning Methodology
Hough Transform Matlab Model
System Partitioning
Goals
Parallelizing Over “Image Tiles”
Parallelizing Over “Theta”
Analyzing Storage Requirements
Analyzing Compute Requirements
Analyzing I/O Bandwidth Requirements
SIMD / Vectorization
Solution Synthesis
Partitioning Validation
Iterating to System Feasibility
Conclusions
References
Support
MUSIC Algorithm
Introduction
System Model
Subspace Algorithm
MUSIC Spectrum Estimation
MATLAB Model
AI Engine Subgraph Designs
IO Adapter Subgraph
QRD Subgraph
SVD Subgraph
DOA Subgraph
Scanner Subgraph
Finder Subgraph
Top-Level Design
Building the Design
Setup and Initialization
Hardware Emulation
Hardware
Hardware-in-the-Loop Demo
Architecture
System Operation
Performance Estimation
Software Version
MATLAB Folder Structure
Steps to Generate and Run HIL Demo Data
Archiving Demo Data
Playback Videos
Client and Server on MATLAB
Conclusions
References
Appendix
Deploying the SD Card Image
Booting the VCK190 Board
Simple Ethernet Configuration
Using a VPN
Running the PS Application
Testing with MATLAB
Support
Softmax Function
Introduction
Softmax Function Definition
Computing the Exponential Function
IEEE 754 Format Trick
Improving Accuracy
Adapting for Single-Precision Floating-Point
AI Engine Implementation
AI Engine Kernel Processing
Kernel Data Interface
Kernel Code
Running the Example
Generating Test Vectors
Running x86 Simulation
Running AI Engine Simulation
Analyzing Results
Vitis Analyzer
Test Vector Comparison
References
Support
TDM Mixer
Introduction
Corner-Turning using Tile DMA
Vectorization of the Mixer
Corner-Turning Concept
Local Tile DMA Tiling Parameters
TDM Mixer Graph Design
Input Buffer Tiling Parameters
Output Buffer Tiling Parameters
Baseline Mixer Design
Vitis Functional Simulation
Optimized Mixer Design
Conclusions
References
Support
Back-Projection for Synthetic Aperture Radar on AI Engines
Back-Projection Engine
Back-Projection Engine
Design Approach
DDR Buffers and PL URAM Buffers
BP Engine Graph aand Kernel Scheduling
Graph View
Floorplan View
Resource Utilization
Throughput and Latency
Hardware Emulation
Hardware
Block Design: ifft2k_async()
Block Design: range_gen()
Block Design: diff3dsq()
Block Design: sqrt()
Block Design: dR_comp()
Block Design: fmod_floor()
Block Design: expjx()
Block Design: interp1()
Block Design: image_buffer()
Final SAR BP Engine Performance
References
Conclusion
Conclusion
Design Builds
Design Builds
Setup and Initialization
Single Engine Design Build
Multiple Engine Design Build
Introduction
Introduction
Goals
GOTCHA Volumetric SAR Data Set
References
Multiple Engines
Multiple Engines
Overview
Placement Constraints
Device-Level Details
Hardware Throughput
Opportunities for Optimization
System Model
System Model
Introduction and Approach
Structural Similarity Index Measure
MATLAB System Model
Inner Loop Analysis
SAR Back-Projection Compute Workloads
Algorithm Adaptations for AI Engine
System Parameter Adaptations
ifft() Adaptations
Vectorized Functional Approximation
interp1() Adaptations
fmod_floor() Adaptations
Final AI Engine Algorithm Performance
SAR BP Engine Block Diagram
References
System Partitioning
System Partitioning
System Parameters & Performance Targets
AI Engine Prototyping
SAR BP Engine Design Proposal
Projected System Throughput
Projected System Resources
Next Steps
References
Feature Tutorials
Feature Tutorials README
AIE A to Z
Custom Base Platform Creation
Platforms
Step 1: Build the AMD Versal™ Extensible Embedded Platform Example Design in Vivado
Step 2: Build the Platform in the Vitis Software Platform
AIE Application Creation
Step 1: Create a new AI Engine Application Project
Step 2: Build the Project and Run Through Emulation-AIE
PL Application Creation
Step 1: Modify the Graph for Use in Hardware Build
Step 2: Add PL Kernels
Step 3: Configure the Hardware Linking Project
Step 4. Build the System
PS Application Creation Run All
Step 1: Create a New Platform in the Bare-metal Domain
Step 2. Build the Baremetal AI Engine Control Application
Step 3: Package the Full System
Step 4: Run the System in Hardware Emulation
Step 5: Build the System targeting the Hardware
Step 6A: Run the System in Hardware via SD Boot
Step 6B: Run or Debug the System in Hardware using JTAG
Summary
Using GMIO
Introduction
Objectives
Steps
RTP Reconfiguration
Introduction
Overview
Steps
Asynchronous Scalar RTP
Asynchronous Array RTP
Asynchronous RTP Read
Synchronous RTP
Summary
Support
Packet Switching
Objectives
Steps
Support
AI Engine Versal Integration
Introduction
Objectives
Tutorial Overview
Section 1: Compile AI Engine Code for AIE Simulator: Viewing Compilation Results in Vitis Analyzer
Compiling an AI Engine ADF Graph for V++ Flow
Vitis Analyzer Compile Summary
Section 2: Simulate the AI Engine Graph using the aiesimulator and Viewing Trace and Profile Results in Vitis Analyzer
Section 3: Run the Hardware Emulation, and View Run Summary in Vitis Analyzer
1. Compiling HLS Kernels Using v++
2. Use V++ to Link AI Engine, HLS Kernels with the Platform
3.Compile the A72 Host Application
4.Package the Design
5.Run Hardware Emulation
Section 4: Build and Run on Hardware
Summary
Support
Versal System Design Clocking Tutorial
Introduction
Objectives
Step 1 - Building ADF Graph
Step 2 - Clocking the PL Kernels
Step 3 - v++ linker – Building the System
Step 4 - Compiling Host Code
Step 5 - Packaging Design and Running on Board
Challenge (Optional)
Build the design for Hardware Emulation
Summary
AI Engine Floating Point
Introduction
AI Engine Architecture Details
Fixed-Point Pipeline
Floating-point Pipeline
Floating-point intrinsics
Start, offset
fpneg, fpabs, fpadd, fpsub
fpneg
fpabs
fpneg_abs
fpadd, fpsub
fpadd_abs, fpsub_abs
fpmul
fpabs_mul
fpneg_mul
fpneg_abs_mul
fpmac, fpmsc, fpmac_abs, fpmsc_abs
fpmul_conf, fpmac_conf
Floating-Point Examples
FIR Filter
Real Floating-Point Filter
Complex Floating-Point Filter
Matrix Multiply
Support
DSP Library
Introduction
Part 1: Creating a Single Kernel Graph
Understanding the Source Files
Compile the application
Running the Design through Simulation
Using Vitis Analyzer to look at the Simulation Results
Part 2: Creating a Multi Kernel Graph
Changes to the Filter Graph from Part 1
Build AI Engine Emulation
Running the Design through Simulation
Using Vitis Analyzer to look at the Compilation and Simulation Results
Part 3: Optimizing Filter Performance
Changes to the Filter Graph from Part 1
Build AI Engine Emulation
Running the Design through Simulation
Using Vitis Analyzer to look at the Compilation and Simulation Results
Conclusion
Debug Walkthrough
Introduction
Example Design: Peak Detector
Vitis IDE Project
Methods
Methods
Debug Methodologies
Best Practices
Support
AIE DSP Lib Model Composer
Introduction
Before You Begin
Overview
Stage 1: Create and Simulate the Design
Stage 2: Further Analysis of the Design
Stage 3: Generate the Code and Perform Emulation-AI Engine
Stage 4: Increasing the PLIO Bitwidth and Re-generate
Conclusion
AI Engine Emulation Waveform Analysis
Introduction
Objectives
Tutorial Overview
Design Overview
Transaction Level Modeling
Steps
Step 1: Build Design
Step 2: Launching Emulation with XSIM Waveform GUI
Step 3: Using XSIM Waveform GUI and QEMU
Exploring the Waveforms
Checking Proper Boot-up Using PMC
Transactions Generated by PS (QEMU) to PL/AIE
PL to AI Engine
AI Engine RTP Signals
AI Engine to PL to DDR Memory
Limitations
Step 4: Using Vitis Analyzer
Summary
AIE Performance Analysis
Introduction
Before You Begin
Objectives
Steps
Support
Implementing IIR Filter
Support
Post Link Recompile
Introduction
AI Engine Application Post-Link Recompile
Objectives
License
Support
RTL IP with AIE Engines
Introduction
Objectives
Tutorial Overview
Step 1 - Creating custom RTL kernels with the Vivado Design Suite
Step 2 - Creating HLS kernels with Vitis compiler
Step 3 - Interfacing ADF graph to Programmable Logic
Step 4 - Building XCLBIN
Step 5 - Build Host Application
Step 6 - Package
Step 7 - Run Emulation
To View Emulation Waveforms
Summary
AIE A to Z Custom Linux Platform
Introduction
Methodology Overview
Objectives
Tutorial
Step 1: Run system design targeting the base platform
Step 2: Create a custom platform
Step 3: Run the system design targeting the custom platform
Support
AIE Compiler Features
Introduction
Objectives
Tutorial Sections
Support
Two Tone Filter
Introduction
Before You Begin
Overview
AIE Independent Graphs
Introduction
Overview
Step 1: Compile and Verify Each Partition with AIE simulator
Partition pr0 in folder pr0_gmio
Partition pr1 in folder pr1_rtp and partition pr2 in folder pr2_perf
Step 2: V++ linker to integrate the partitions
Step 3: Compile host code
Step 4: Package for hardware
Step 5: Run applications in HW
Summary
Support
AIE PL Interface
Introduction
Part 1 - Connecting RTL AXI4-Stream Interfaces (included in Block Design) to the AI Engine
Platform
Hardware Platform creation
Vitis V++ Link
Hardware Emulation
Part 2 - Connecting RTL AXI4-Stream interfaces (NOT included in Block Design) to the AI Engine
Hardware Platform
Vitis V++ Link
Hardware Emulation
README
Part 3 - Connecting Monitored RTL Interfaces to AI Engine
Creating the design
Running the Design in Hardware
Part 4 - Broadcasting Data to the AI Engine and the Programmable
Creating the design
Hardware Emulation
AIE Kernel Optimization
Introduction
Understanding Microcode
Instruction Formats and Decoding
Load and Store Instructions
Scalar Instructions
Move Instructions
Vector Instructions
Microcode Review
Using the Vitis Analyzer
Locating Microcode
Measuring Performance
Common Optimization Opportunities
Bilinear Interpolation Example
Register Spilling
Pipelining Delay
Restrict Pointers
API Limitations
Mapping to SIMD
Hands-On Application
References
Support
AIE-ML
AIE-ML README
Introduction
Feature Tutorials
Design Tutorials
Feature Tutorials
A to Z Bare-metal Flow
Introduction
Support
Using GMIO with AIE-ML
Introduction
Objectives
Steps
Runtime Parameter Reconfiguration
Introduction
Objectives
Steps
Support
Packet Switching
Objectives
Steps
Support
Versal Integration for Hardware Emulation and Hardware
Introduction
Objectives
Tutorial Overview
Section 1: Compile AI Engine Code for AIE Simulator: Viewing Compilation Results in Vitis Analyzer
Important
Compiling an AI Engine ADF Graph for V++ Flow
Vitis Analyzer Compile Summary
Section 2: Simulate the AI Engine Graph using the aiesimulator and Viewing Trace and Profile Results in Vitis Analyzer
Section 3: Run the Hardware Emulation, and View Run Summary in Vitis Analyzer
1. Compiling HLS Kernels Using v++
2. Use V++ to Link AI Engine, HLS Kernels with the Platform
3.Compile the A72 Host Application
4.Package the Design
5.Run Hardware Emulation
Section 4: Build and Run on Hardware
Summary
Support
Matrix Compute with Vitis Libraries on AIE and AIE-ML
Matrix Compute with Vitis Libraries on AIE and AIE-ML
Introduction
AMD Versal Devices with AI Engine Variants
Tiling Parameter Programming
Tiling Parameter Programming
Introduction
Objectives
Prerequisite knowledge
Tutorial Overview
Basics of Tiling Parameter Programming
Introduction
Tiling parameter structure
Buffer Descriptors
Utilities
CreateNDData.py
GetTiles.py
CompilerReport.py
Conclusion
Support
AI Engine-ML Performance Analysis Tutorial
Objectives
Target Application Introduction
Steps - Version 1
Steps - Version 2
Steps – Version 3
Steps - Version 4
Conclusion
Support
AIE Compiler Features
Introduction
Objectives
Tutorial Sections
Conditional Objects
Case 1
Case 2
Case 3
Case 4
Multirate
UpConv then DownConv (Buffer)
DownConv then UpConv (Buffer)
Split and Merge (Buffer)
UpConv then DownConv (Stream)
DownConv then UpConv (Stream)
Split and Merge (stream)
Multicast
Case 1: Stream and Buffer Multicasting
Case 2: Multirate Buffer Multicasting
Design Tutorials
AIE-ML Programming
Introduction
Objectives
Prerequisite knowledge
Matrix Multiplication
Taking advantage of AI Engine-ML architecture
Matrix Multiplication modes for real types
Matrix Multiplication modes for complex types
AI Engine-ML code analysis
Running the tutorial
Performance analysis
Conclusion
Support
Prime Factor FFT-1008 on AIE-ML
Prime Factor FFT-1008 on AIE-ML
Introduction
Matlab Models
Design Overview
INPUT PERMUTE Kernel
FFT-7 Kernel
TRANSPOSE-0 Kernel
FFT-9 Kernel
TRANSPOSE-1 Kernel
FFT-16 Kernel
OUTPUT PERMUTE Kernel
Design Resources
Build and Run Design
Setup & Initialization
Hardware Emulation
Hardware
References
Support
License
AIE-ML LeNet Tutorial
Introduction
Tutorial Overview
Before You Begin
Tools: Installing the Tools
Environment: Setting Up the Shell Environment
AIE API based FFT for Many Instances Applications
Introduction
Table of Contents
Objectives
Required Background Knowledge
Considered Case Study
Design Strategy
Designing the FFT Application with the AI Engine ML
Designing the Kernel with the AI Engine API
Understanding the APIs
Coding the Kernel
Twiddles Header File
Kernel Header File
Kernel Source Code
Designing the Graph
Data Movement Design
Designing Data Movement with the Memory Tiles
Using the Shared Buffers
Coding the Graph
Graph Header File
Graph Source Code
Implementing and Evaluating the AIE-ML Design with Vitis Unified IDE
Creating the AI Engine ML Project in Vitis
x86 Simulation and Functional Validation
AI Engine Simulation, Array and Trace Analysis
Optimizing the AIE-ML Design
Graph Optimizations
x86 Simulation and Functional Validation
AI Engine Simulation, Array and Trace analysis
Support
Softmax Function on AIE-ML
Introduction
Softmax Function Definition
Computing the Exponential Function
IEEE 754 Format Trick
Improving Accuracy
Adapting for bfloat16 Floating-Point
AI Engine Implementation
AI Engine Kernel Processing
Kernel Data Interface
Kernel Code
Running the Example
Generating Test Vectors
Running x86 Simulation
Running AI Engine Simulation
Analyzing Results
Vitis Analyzer
Test Vector Comparison
References
Support
Migrating Farrow Filter from AIE to AIE-ML
Introduction
Comparison of AIE vs AIE-ML Farrow Filter Design Implementation
Conclusion
Polyphase Channelizer on AIE-ML using Vitis Libraries
Introduction
Channelizer Requirements
System Partitioning
Filterbank System Partitioning
Filterbank Compute Requirements
Filterbank Storage Requirements
Filterbank I/O Bandwidth Requirements
Filterbank Library Characterization
Filterbank Library Optimization
IFFT-2D System Partitioning
Available Workflows for IFFT-2D IP
IFFT-2D Library Characterization
IFFT-2D Library Optimization
Design Summary
Design Resources
Build and Run Design
Setup & Initialization
Hardware Emulation
Hardware
References
Support
License
MNIST ConvNet on AIE-ML
MNIST ConvNet on AIE-ML
Introduction
Virtual Python Environment Setup
Jupyter Notebook Model
Import the MNIST Image Database
Training & Testing the MNIST ConvNet Model
Using the MNIST ConvNet for Inference
Extracting Weights & Biases for AIE-ML Inference Solution
AIE-ML Inference Solution
Design Approach
Vitis Functional Simulation
MNIST ConvNet: AI Engine Graph View
MNIST ConvNet: AI Engine Floorplan View
MNIST ConvNet: AI Engine Resource Utilization
Vectorization of 3x3 Conv2D Layer Processing
MNIST ConvNet: Profiling & Vector Load
MNIST ConvNet: Throughput
Individual Layer Designs
Layer Design Details: conv2d_w1()
Layer Design Details: max_pooling2d_w2()
Layer Design Details: conv2d_w3()
Layer Design Details: max_pooling2d_w4()
Layer Design Details: conv2d_w5()
Layer Design Details: dense_w7()
Summary
References
Support
License