AIE-ML LeNet Tutorial - 2024.1 English

Vitis Tutorials: AI Engine

Document ID
XD100
Release Date
2024-06-19
Version
2024.1 English

Version: Vitis 2024.1

Table of Contents

Introduction

Before You Begin

Building the LeNet Design

Hardware Design Details

Software Design Details

Throughput Measurement Details

References

Introduction

The AMD Versal™ adaptive SoC AI Engine-Machine Learning (AIE-ML) series has emerged in response to the surging demand for machine learning and compute-intensive applications. This groundbreaking architecture integrates the AIE-ML, the dual-core Arm® Cortex®-A72, Cortex-R5F processor (PS), and leading-edge programmable logic (PL), interconnected through a high-bandwidth NoC. Collaboratively, the AIE-ML and PL synergize their unique capabilities to efficiently tackle specific functions. Crafted with a customized memory hierarchy, AI interconnect’s multi-cast stream capability, and extensive support for AI-optimized vector instructions, the Versal adaptive SoC AIE-MLs undergo meticulous optimization for diverse compute-intensive applications. These systems notably outshine in areas such as machine learning inference acceleration within data center applications, achieving deterministic latency and minimal neural network latency while delivering outstanding performance per watt.

This tutorial uses the LeNet algorithm to implement a system-level design to perform image classification using the AI Engine-ML and PL, including block RAM. The design demonstrates functional partitioning between the AI Engine-ML and PL. It also highlights memory partitioning and hierarchy among Memory tile, DDR memory, PL (block RAM) and AI Engine-ML memory.

The tutorial takes you through hardware emulation and hardware flow in the context of a complete Versal adaptive SoC system integration. A Makefile is provided that you can modify to suit your needs in a different context.

Objectives

Objectives

After completing the tutorial, you should be able to:

  • Build a complete system design by going through the various steps in the AMD Vitis™ unified software platform flow, including creating the AI Engine-ML Adaptive Data Flow (ADF) API graph, compiling the A72 host application and compiling PL kernels, using the Vitis compiler (V++) to link the AI Engine-ML and HLS kernels with the platform, and packaging the design. You will also be able to run the design through the hardware emulation and hardware flow in a mixed System C/RTL cycle-accurate/QEMU-based simulator.

  • Develop an understanding of Convolutional Neural Network (CNN) layer details using the LeNet algorithm and how the layers are mapped into data processing and compute blocks.

  • Develop an understanding of the kernels developed in the design; AI Engine-ML kernels to process fully connected convolutional layers and PL kernels to process the input rearrange and max pool and rearrange functions.

  • Develop an understanding of memory tiles and configuration of the shared buffer.

  • Develop an understanding of the AI Engine-ML IP interface using the AXI4-Stream interface.

  • Develop an understanding of memory hierarchy in a system-level design involving DDR memory, PL block RAM, and AI Engine-ML memory.

  • Develop an understanding of graph control APIs to enable run-time updates using the run-time parameter (RTP) interface.

  • Develop an understanding of performance measurement and functional/throughput debug at the application level.

Tutorial Overview

Tutorial Overview

In this application tutorial, the LeNet algorithm is used to perform image classification on an input image using five AI Engine-ML tiles and PL resources, including block RAM. A top-level block diagram is shown in the following figure. An image is loaded from DDR memory through the Network on Chip (NoC) to block RAM and then to the AI Engine-ML. The PL input pre-processing unit receives the input image and sends the output to the first AI Engine-ML tile to perform matrix multiplication. The output from the first AI Engine-ML tile goes to a PL unit to perform the first level of max pool and data rearrangement (M1R1). The output is fed to the second AI Engine-ML tile and the output from that tile is sent to the PL to perform the second level max pooling and data rearrangement (M2R2). The output is then sent to a fully connected layer (FC1) implemented in two AI Engine-ML tiles and uses the rectified linear unit layer (ReLu) as an activation function. The outputs from the two AI Engine-ML tiles are then fed into a second fully connected layer implemented in the core04 AI Engine-ML tile. The output is sent to a data conversion unit in the PL and then to the DDR memory through the NoC. In between the AI Engine-ML and PL units is a datamover module (refer to the LeNet Controller in the following figure) that contains the following kernels:

  • mm2s: a memory-mapped to stream kernel to feed data from DDR memory through the NoC to the AI Engine-ML Array.

  • s2mm: a stream to memory-mapped kernel to feed data from the AI Engine-ML Array through NoC to DDR memory.

Image of LeNet Block Diagram

In the design, there are two major PL kernels. The input pre-processing units, M1R1 and M2R2 are contained in the lenet_kernel RTL kernel, which has already been packaged as a Xilinx object .xo (XO) file. The datamover kernel dma_hls provides the interface between the AI Engine-ML and DDR memory. The five AI Engine-ML kernels all implement matrix multiplication. The matrix dimensions depend on the image dimension, weight dimension, and a number of features.

Directory Structure

Directory Structure

lenet
|____design......................contains AI Engine-ML kernel, HLS kernel source files, and input data files
|    |___aie_src.................contains all the aie source files
|    |___pl_src..................contains all the data mover source files
|    |___host_app_src............contains host application source files
|    |___directives..............contains directives for various vitis compilation stages 
|    |___exec_scripts............contains run commands
|    |___profiling_configs.......contains xrt.ini file
|    |___system_configs..........contains all system configuration files
|    |___vivado_metrics_scripts..contains scripts to get vivado metric reports
|____images......................contains images that appear in the README.md
|____Makefile....................with recipes for each step of the design compilation
|____description.json............for XOAH
|____sample_env_setup.sh.........required to setup Vitis environment variables and Libraries