AIE - 2024.1 English

Vitis Tutorials: AI Engine

Document ID
XD100
Release Date
2024-10-30
Version
2024.1 English

Table of Contents

Building the Design

Hardware Design Details

Software Design Details

Performance Details

Building the Design

Design Build

Design Build

In this section, you build and run the GeMM design using the AI Engine implementation. You compile the AI Engine design and integrate it into a larger system design (including the PL kernels and PS host application). Review the Integrating the Application section in the AI Engine Documentation for the general flow.

At the end of this section, the design flow will generate a new directory (called build/). Underneath are sub-directories named (gemm_$(MAT_DIMS)/ (for example, gemm_32x32x32/) depending on the Mat A and Mat B dimensions and the number of instances x$(GEMM_INSTS) chosen in the build. Each sub-directory contains the hw_emu/ and/or hw/ subfolders. The respective subfolders contain Work/ and libadf.a, outputs from the AI Engine compiler, the host app executable and the builds, targeted to hw or hw_emu respectively. The hw_emu/ subfolder contains the build for hardware emulation. The hw/ subfolder contains the build for hardware run on a VCK190 board.

Make Steps

Make Steps

To run the following make steps (that is, make kernels, make graph, and so on), you must be in the AIE/ folder. The options that can be specified in the make steps are as follows.

TARGET: This can be set to hw or hw_emu to build the design in the hardware or hardware emulation flow respectively. The default option is hw_emu.

GEMM_INSTS: This is set to 1 by default and is not allowed to be changed right.

GEMM_SIZE: Matrix Dimensions Involved. 32 means Mat A (input matrix 1), B (input matrix 2) and C (output matrix) are square matrices of dimension 32. Permissible values are 32, 64, 128, 256, 512, and 1024.

ITER_CNT: The number of iterations the design is run. The default is 1.

EN_TRACE: Flag to enable trace profiling. 0 is disabled and 1 is enabled. The default is 0 (disabled).

The Makefile uses the following directory references:

## Relative gemm directory
RELATIVE_PROJECT_DIR := ./

## Absolute gemm directory = <user path>/Tutorials/AI_Engine/gemm
PROJECT_REPO := $(shell readlink -f $(RELATIVE_PROJECT_DIR))

DESIGN_REPO  := $(PROJECT_REPO)/design
AIE_SRC_REPO := $(DESIGN_REPO)/aie_src
PL_SRC_REPO  := $(DESIGN_REPO)/pl_src
HOST_APP_SRC_REPO := $(DESIGN_REPO)/host_app_src

SYSTEM_CONFIGS_REPO    := $(DESIGN_REPO)/system_configs
PROFILING_CONFIGS_REPO := $(DESIGN_REPO)/profiling_configs
EXEC_SCRIPTS_REPO      := $(DESIGN_REPO)/exec_scripts
VIVADO_METRICS_SCRIPTS_REPO := $(DESIGN_REPO)/vivado_metrics_scripts
DIRECTIVES_REPO        := $(DESIGN_REPO)/directives


BASE_BLD_DIR     := $(PROJECT_REPO)/build
GEMM_BLD_DIR     := $(BASE_BLD_DIR)/gemm_$(MAT_DIMS)
INSTS_BLD_DIR    := $(GEMM_BLD_DIR)/x$(GEMM_INSTS)
BUILD_TARGET_DIR := $(INSTS_BLD_DIR)/$(TARGET)

REPORTS_REPO := $(PROJECT_REPO)/reports_dir
BLD_REPORTS_DIR := $(REPORTS_REPO)/gemm_$(MAT_DIMS)/x$(GEMM_INSTS)

XPE_REPO := $(PROJECT_REPO)/xpe_dir
BLD_XPE_DIR := $(XPE_REPO)/gemm_$(MAT_DIMS)/x$(GEMM_INSTS)/$(TARGET)
VCD_FILE_NAME := gemm_$(MAT_DIMS)_x$(GEMM_INSTS)
BLD_TGT_VCD_FILE := $(BUILD_TARGET_DIR)/$(VCD_FILE_NAME).vcd
XPE_FILE := $(BLD_XPE_DIR)/graph_$(VCD_FILE_NAME).xpe

EMBEDDED_PACKAGE_OUT := $(BUILD_TARGET_DIR)/package
EMBEDDED_EXEC_SCRIPT := run_script.sh

WORK_DIR := Work
AIE_SIM_IO_BASE_DIR := $(AIE_SRC_REPO)/aiesim_data
AIE_SIM_IO_DIR := $(AIE_SIM_IO_BASE_DIR)/gemm_$(MAT_DIMS)_ioFiles
Build the Entire Design with a Single Command

Build the Entire Design with a Single Command

If you are already familiar with the AI Engine and Vitis kernel compilation flows, you can build the entire design for each case of gemm_$(MAT_DIMS) with one command:

make run (default target is hardware emulation, 1 instance, gemm_$(MAT_DIMS) matrix dimensions, iterations=1 and no trace-profiling )

or

make run TARGET=hw ITER_CNT=16 EN_TRACE=1 GEMM_SIZE=64 (hardware, 16 iterations, , matrix dimentions 64 for Mat A, B and C and enable trace profiling )

This command runs the make kernels,make graph,make xsa,make application,make package, and make run_emu for hardware emulation or to run on hardware (VCK190 board) depending on the TARGET you specify. The settings also apply to the individual make steps listed below.

The generated files for each gemm_$(MAT_DIMS) are placed under an individual directory: $(BUILD_TARGET_DIR)/. Each make step to build the design is specified in the following sections. These sections also detail the options used and the location of input and output files in each case.

make kernels: Compiling PL Kernels

make kernels: Compiling PL Kernels

In this step, the Vitis compiler takes any Vitis compiler kernels (RTL or HLS C) in the PL region of the target platform (xilinx_vck190_base_202410_1) and the AI Engine kernels and graph and compiles them into their respective XO files. The following commands compile the kernels (default TARGET=hw_emu, GEMM_INSTS=1, GEMM_SIZE=32, ITER_CNT=1 and EN_TRACE=0).

make kernels

The command alongwith the options used is as follows (for dma_hls):

$(BUILD_TARGET_DIR)/$(DATAMOVER_KERNEL_XO).xo: 
	mkdir -p $(BUILD_TARGET_DIR); \
	cd $(BUILD_TARGET_DIR); \
	v++ --target $(TARGET) $(DATAMOVER_KERNEL_VPP_FLAGS) \
		$(VPP_FLAGS) -c -k $(DATAMOVER_KERNEL_TOP) \
		$(DATAMOVER_KERNEL_SRC) -o $@

See this page for a detailed description of all Vitis compiler switches. The following table provides a summary of the switches used.

Switch Description
--target | -t [hw|hw_emu] Specifies the build target.
--platform | -f Specifies the name of a supported acceleration platform as specified by the $PLATFORM_REPO_PATHS environment variable or the full path to the platform XPFM file.
--save-temps | -s Directs the Vitis compiler command to save intermediate files/directories created during the compilation and link process. Use the --temp_dir option to specify a location to write the intermediate files to.
--temp_dir This allows you to manage the location where the tool writes temporary files created during the build process. The temporary results are written by the Vitis compiler, and then removed, unless the --save-temps option is also specified.
--verbose Display verbose/debug information.
--compile | -c Required for compilation to generate XO files from kernel source files.
--kernel \<arg>|-k \<arg> Compile only the specified kernel from the input file. Only one -k option is allowed per Vitis compiler command.
--output | -o Specifies the name of the output file generated by the V++ command. The DMA HLS kernels output should be XO.
Input Description
$(PL_SRC_REPO)/dma_hls.cpp Defines the data mover PL kernel.
Output Description
$(BUILD_TARGET_DIR)/dma_hls.hw_emu.xo The data mover kernel object file.
make graph: Creating the AI Engine ADF Graph for the Vitis Compiler Flow

make graph: Creating the AI Engine ADF Graph for Vitis Compiler Flow

An ADF graph can be connected to an extensible Vitis platform (the graph I/Os can be connected either to platform ports or to ports on Vitis kernels through Vitis compiler connectivity directives).

  • The AI Engine ADF C++ graph of the design contains AI Engine kernels and PL kernels.

  • All interconnects between kernels are defined in the C++ graph

  • All interconnections to external I/O are fully specified in the C++ simulation testbench (graph.cpp) that instantiates the C++ ADF graph object.

To compile the graph using the Makefile flow type (default TARGET=hw_emu, GEMM_INSTS=1, GEMM_SIZE=32, ITER_CNT=1 and EN_TRACE=0):

make graph

The following AI Engine compiler command, alongwith the options used, compiles the AI Engine design graph:

...
AIE_FLAGS := -include=$(AIE_SRC_REPO)
AIE_FLAGS += -include=$(DSPLIB_ROOT)/L1/include/aie
AIE_FLAGS += -include=$(DSPLIB_ROOT)/L1/src/aie
AIE_FLAGS += -include=$(DSPLIB_ROOT)/L1/tests/aie/inc
AIE_FLAGS += -include=$(DSPLIB_ROOT)/L1/tests/aie/src
AIE_FLAGS += -include=$(DSPLIB_ROOT)/L2/include/aie
AIE_FLAGS += -include=$(DSPLIB_ROOT)/L2/tests/aie/common/inc
AIE_FLAGS += --verbose
AIE_FLAGS += --Xpreproc="-DITER_CNT=$(ITER_CNT)"
AIE_FLAGS += --Xpreproc="-DGRAPH_ITER_CNT=$(GRAPH_ITER_CNT)"
AIE_FLAGS += --Xpreproc="-DGEMM_SIZE=$(GEMM_SIZE)"
AIE_FLAGS += --Xpreproc="-DGEMM_INSTS=$(GEMM_INSTS)"
AIE_FLAGS += --platform=$(PLATFORM)
#AIE_FLAGS += --target=$(TARGET)

AIE_FLAGS += --log-level=5
#AIE_FLAGS += --test-iterations=2
AIE_FLAGS += --pl-freq=$(PL_FREQ)
#AIE_FLAGS += --dataflow

#AIE_FLAGS += --constraints=$(AIE_SRC_REPO)/constraints.aiecst

AIE_FLAGS += --Xmapper=BufferOptLevel9
AIE_FLAGS += --Xrouter=DMAFIFOsInFreeBankOnly


AIE_FLAGS += --workdir=$(WORK_DIR)

AIE_SIM_FLAGS := --pkg-dir $(WORK_DIR)/
AIE_SIM_FLAGS += -i=$(AIE_SIM_IO_DIR)

...
graph: $(LIBADF_A)

$(LIBADF_A):  $(AIE_SRC_REPO)/graph.*
	mkdir -p $(BUILD_TARGET_DIR); \
	cd $(BUILD_TARGET_DIR); \
	aiecompiler $(AIE_FLAGS) $(GRAPH_SRC_CPP) 2>&1 | tee -a aiecompiler.log

See this page for full AI Engine programming environment documentation.

The following table provides a summary of the switches used.

Switch Description
--include=\<string> Specify compile-time include directory (zero or more).
--verbose|-v Verbose output of the AI Engine compiler emits compiler messages at various stages of compilation. These debug and tracing logs provide useful messages on the compilation process.
--Xpreproc="-D\<Pre-processor Macro String>" Specify compile time macro.
--Xchess="\<Chess Make Options>" Specify compile time chess make options; "main:bridge.llibs=softfloat m" enables floating point operations.
--heapsize=\<int> Heap size in bytes.
--log-level=\<int> Log level for verbose logging (default=1).
--workdir=\<string> By default, the compiler writes all outputs to a sub-directory of the current directory, called Work. Use this option to specify a different output directory.

The following is a description of the output objects that results from executing the AI Engine compiler (aiecompiler) command.

Inputs Sources Description
$(AIE_SRC_REPO)/graph.cpp Defines the GeMM graph objects.
Output Objects Description
$(BUILD_TARGET_DIR)/libadf.a Compiled AI Engine design graph.
$(BUILD_TARGET_DIR)/Work/ Directory that contains all outputs of the AI Engine compiler.
make xsa: Using the Vitis Tools to Link AI Engine and HLS Kernels with the Platform

make application: Compiling the Host Application

You can compile the host application by following the typical cross-compilation flow for the Cortex A72. To build the application, run the following command (default TARGET=hw_emu, GEMM_INSTS=1, GEMM_SIZE=32, ITER_CNT=1 and EN_TRACE=0):

make application

or

application: graph $(BUILD_TARGET_DIR)/$(APP_ELF)

REG_GCC_FLAGS := $(GCC_FLAGS)
REG_GCC_FLAGS += -DITER_CNT=$(ITER_CNT)
REG_GCC_FLAGS += -DGRAPH_ITER_CNT=$(GRAPH_ITER_CNT)

$(BUILD_TARGET_DIR)/$(APP_ELF): $(HOST_APP_SRC)/* $(LIBADF_A)
	@rm -rf $(BUILD_TARGET_DIR)/app_control.o $(BUILD_TARGET_DIR)/gemm_aie_app.o $(BUILD_TARGET_DIR)/$(APP_ELF)
	$(CXX) $(REG_GCC_FLAGS) $(GCC_INC_FLAGS) $(AIE_CONTROL_CPP) -o $(BUILD_TARGET_DIR)/app_control.o
	$(CXX) $(REG_GCC_FLAGS) $(GCC_INC_FLAGS) $(APP_SRC_CPP) -o $(BUILD_TARGET_DIR)/gemm_aie_app.o $(GCC_INC_LIB) $(GCC_LIB)
	$(CXX) $(BUILD_TARGET_DIR)/app_control.o $(BUILD_TARGET_DIR)/gemm_aie_app.o $(GCC_INC_LIB) $(GCC_LIB) -o $(BUILD_TARGET_DIR)/$(APP_ELF)

See this page for XRT documentation. See this page for details of host application programming.

Switch Description
-O | Optimize. Optimizing compilation takes more time and a lot more memory for a large function. With -O, the compiler tries to reduce code size and execution time, without performing any of the optimizations that can take a great deal of compilation time.
-D__linux__
-DXAIE_DEBUG Enable debug interface capabilities where certain core status, event status, or stack trace can be dumped out.
-D\<Pre-processor Macro String>=\<value> Pass pre-processor macro definitions to the cross-compiler.
-I \<dir> Add the directory dir to the list of directories to be searched for header files.
-o \<file> Place output in file <file>. This applies regardless of the output being produced, whether it be an executable file, an object file, an assembler file, or preprocessed C code.
--sysroot=\<dir> Use dir as the logical root directory for headers and libraries. For example, if the compiler normally searches for headers in /usr/include and libraries in /usr/lib, it instead searches dir/usr/include and dir/usr/lib. This is automatically set by the env_setup.sh script.
-l\<library> Search the library named library when linking. The GeMM tutorial requires the adf_api_xrt and xrt_coreutil libraries.
-L \<dir> Add directory <dir> to the list of directories to be searched for -l.

The following is a description of the input sources compiled by the AI Engine compiler command.

Inputs Sources Description
$(HOST_APP_SRC_REPO)/gemm_aie_app.cpp Source application file for the gemm_aie_xrt.elf that will run on an A72 processor.
$(BUILD_TARGET_DIR)/Work/ps/c_rts/aie_control_xrt.cpp This is the AI Engine control code generated implementing the graph APIs for the GeMM graph.

The following is a description of the output objects that results from executing the AI Engine compiler command with the above inputs and options.

Output Objects Description
$(BUILD_TARGET_DIR)/gemm_aie_xrt.elf The executable that will run on an A72 processor.
make package: Packaging the Design

make package: Packaging the Design

With the AI Engine outputs created, as well as the new platform, you can now generate the programmable device image (PDI) and a package to be used on an SD card. The PDI contains all the executables, bitstreams, and configurations of the device. The packaged SD card directory contains everything to boot Linux, the generated applications, and the XCLBIN.

The command to run this step is as follows (default TARGET=hw_emu, GEMM_INSTS=1, GEMM_SIZE=32, ITER_CNT=1 and EN_TRACE=0):

make package

or

...
PKG_FLAGS := -t $(TARGET)
PKG_FLAGS += --save-temps
PKG_FLAGS += --temp_dir $(BUILD_TARGET_DIR)/_x
PKG_FLAGS += -f $(PLATFORM)
PKG_FLAGS += --package.rootfs $(COMMON_IMAGE_VERSAL)/rootfs.ext4
PKG_FLAGS += --package.kernel_image $(COMMON_IMAGE_VERSAL)/Image
PKG_FLAGS += --package.boot_mode=sd
PKG_FLAGS += --package.out_dir $(EMBEDDED_PACKAGE_OUT)
PKG_FLAGS += --package.image_format=ext4
PKG_FLAGS += --package.sd_file $(BUILD_TARGET_DIR)/$(APP_ELF) $(BUILD_TARGET_DIR)/$(XSA) $(LIBADF_A)
PKG_FLAGS += --package.sd_file $(BUILD_TARGET_DIR)/$(APP_ELF_INF_RUN) 
PKG_FLAGS += --package.sd_file $(EXEC_SCRIPTS_REPO)/$(EMBEDDED_EXEC_SCRIPT)

## If Profiling for Performance Measurement is enabled..
ifeq ($(EN_TRACE),1)
   ifeq ($(TARGET),hw)
      PKG_FLAGS += --package.sd_file $(PROFILING_CONFIGS_REPO)/xrt.ini
   
   endif
endif

## If XRT_ROOT is set...
ifdef XRT_ROOT
   PKG_FLAGS += --package.sd_dir $(XRT_ROOT)

endif

PKG_FLAGS += --package.defer_aie_run
...
package: application application_inf_run xsa $(EMBEDDED_PACKAGE_OUT)

$(EMBEDDED_PACKAGE_OUT): $(PROFILING_CONFIGS_REPO)/* $(EXEC_SCRIPTS_REPO)/* $(BUILD_TARGET_DIR)/$(APP_ELF) $(BUILD_TARGET_DIR)/$(XSA) $(BUILD_TARGET_DIR)/$(APP_ELF_INF_RUN)
	rm -rf $(EMBEDDED_PACKAGE_OUT)
	cd $(BUILD_TARGET_DIR);	\
	v++ -p $(PKG_FLAGS)

See this page for more details about packaging the system.

Switch Description
--target | -t [hw|hw_emu] Specifies the build target.
--package | -p Packages the final product at the end of the Vitis compile and link build process.
--package.rootfs \<arg> Where \<arg> specifies the absolute or relative path to a processed Linux root file system file. The platform RootFS file is available for download from xilinx.com. Refer to the Vitis Software Platform Installation for more information.
--package.kernel_image \<arg> Where \<arg> specifies the absolute or relative path to a Linux kernel image file. Overrides the existing image available in the platform. The platform image file is available for download from xilinx.com. Refer to the Vitis Software Platform Installation for more information.
--package.boot_mode \<arg> Where \<arg> specifies . Boot mode used for running the application in emulation or on hardware.
--package.image_format Where \<arg> specifies the \<ext4|fat32> output image file format. ext4 is the Linux file system and fat32 is the Windows file system.
--package.sd_file Where \<arg> specifies an ELF or other data file to package into the sd_card directory/image. This option can be used repeatedly to specify multiple files to add to the sd_card directory.
--package.defer_aie_run Load the AI Engine application with the ELF file, but wait to run it until graph run directs it. This is required in the PS based AI Engine flow.
Inputs Sources Description
$(PLATFORM_REPO_PATHS)/sw/versal/xrt The PS host application needs the XRT headers in this folder to execute.
$(PLATFORM_REPO_PATHS)/sw/versal/xilinx-versal/rootfs.ext4 The root filesystem file for PetaLinux.
$(PLATFORM_REPO_PATHS)/sw/versal/xilinx-versal/Image The pre-built PetaLinux image that the processor boots from.
$(BUILD_TARGET_DIR)/gemm_aie_xrt.elf The PS host application executable created in the make application step.
$(BUILD_TARGET_DIR)/vck190_aie_gemm.hw_emu.xsa The XCLBIN file created in the make xsa step.
$(BUILD_TARGET_DIR)/libadf.a The compiled AI Engine design graph created in the make graph step.

The output of the Vitis compiler package step is the package directory that contains the contents to run hardware emulation.

Output Objects Description
$(BUILD_TARGET_DIR)/package The hardware emulation package that contains the boot file, hardware emulation launch script, PLM and PMC boot files, PMC and QEMU command argument specification files, and Vivado simulation folder.
make run_emu: Running Hardware Emulation

make run_emu: Running Hardware Emulation

After packaging, everything is set to run hardware emulation. To run emulation, use the following command (default TARGET=hw_emu):

make run_emu 

or

###########################################################################
Hardware Emulation Goto:
$(BUILD_TARGET_DIR)/package

and do:
./launch_hw_emu.sh or ./launch_hw_emu.sh -g (for waveform viewer) or ./launch_hw_emu.sh -run-app $(EMBEDDED_EXEC_SCRIPT) (to check results without opening waveform viewer) ...

When hardware emulation is launched, you see the QEMU simulator load. Wait for the autoboot countdown to go to zero. After a few minutes, the root Linux prompt comes up:

root@versal-rootfs-common-2024_1:~#

After the root prompt comes up, run the following commands to run the design:

mount /dev/mmcblk0p1 /mnt
cd /mnt
./gemm_aie_xrt.elf a.xclbin

The gemm_aie_xrt.elf executes. After a few minutes, you should see the output with TEST PASSED on the console. When this is shown, run the following keyboard command to exit the QEMU instance:

#To exit QEMU Simulation
Press CtrlA, let go of the keyboard, and then press x 

To run with waveform, do the following:

cd $(BUILD_TARGET_DIR)/package
./launch_hw_emu.sh -g

The XSIM Waveform Viewer is launched. Drag and drop the signals into the viewer and click Play to start the emulation. Go back to the terminal and wait for the Linux prompt to show up. In the XSIM Waveform Viewer, you see the signals you added to the waveform adjusting over the execution of the design. When this is done, hit the pause button and close the window to end the emulation.

The following figure shows a waveform view of the gemm_32x32x32 - 1x design.

Image of GeMM AIE HW_EMU Run Waveform View For 32x32x32 Design

TARGET=hw: Running on Hardware

Running on Hardware

To run the design in hardware, rerun the following make steps with TARGET=hw and other applicable options (see the preceding make steps specified above).

make kernels xsa package TARGET=hw 

These commands create a $(BUILD_TARGET_DIR) folder with the kernels, XSA, and package for a hardware run.

Run the following step to set up the execution file, generated images, and base images ($(BUILD_TARGET_DIR)/package/sd_card and $(BUILD_TARGET_DIR)/package/sd_card.img).

make run_emu TARGET=hw 

These commands create a build/hw folder with the kernels, XSA, and package for a hardware run. Follow steps 1-9 to run the gemm_aie_xrt.elf executable on your VCK190 board.

Step 1. Ensure your board is powered off.

Step 2. Use an SD card writer (such as balenaEtcher) to flash the sd_card.img file to an SD card.

Step 3. Plug the flashed SD card into the top slot of the VCK190 board.

Step 4. Set the switch (SW1 Mode\[3:0\]=1110 = OFF OFF OFF ON).

Step 5. Connect your computer to the VCK190 board using the USB cable included with the board.

Step 6. Open a TeraTerm terminal and select the correct COM port. Set the port settings to the following:

Port: <COMMXX>
Speed: 115200
Data: 8 bit
Parity: none
Stop Bits: 1 bit
Flow control: none
Transmit delay: 0 msec/char 0 msec/line

Step 7. Power on the board.

Step 8. Wait until you see the root@versal-rootfs-common-2024.1 Linux command prompt. Press Enter a few times to get past any xinit errors.

Step 9. Run the following commands in the TeraTerm terminal:

cd /mnt/sd-mmcblk0p1

./gemm_aie_xrt.elf a.xclbin

Hardware Design Details

GeMM AI Engine Implementation Architecture and AI Engine/PL Function Partitioning

GeMM AI Engine Implementation Architecture and AI Engine/PL Function Partitioning

The following figure shows a high-level block diagram of the design. The test harness consists of the AI Engine and data mover HLS kernels (dma_hls). In this setup, there is an AXI4-Stream interface between the data mover kernels and AI Engines, with a data width of 128 bits. The data mover kernels and the AI Engine array interface are running at 312.5 MHz.

The data mover is a PL-based data generator and checker. It generates constant matrices as inputs and checks the output of the gemm core for its output.

GeMM Block Diagram for Matrices 32x32x32 to 1024x1024x1024

Image of GeMM AIE Implementation Architecture GeMM 32x32x32 to 1024x1024x1024

Design Details