N-Body Simulator - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID
Release Date
2023.2 English

Version: Vitis 2023.1


This tutorial is an implementation of an N-Body Simulator in the AI Engine. It is a system-level design that uses the AI Engine, PL, and PS resources to showcase the following features:

  • A Python model of an N-Body Simulator run on x86 machine

  • A scalable AI Engine design that can utilize up to 400 AI Engine tiles

  • AI Engine packet switching

  • AI Engine single-precision floating point calculations

  • AI Engine 1:400 broadcast streams

  • Codeless PL HLS datamover kernels from the AMD Vitis™ Utility Library

  • PL HLS packet switching kernels

  • PS Host Application that validates the data coming out of the AI Engine design

  • C++ model of an N-Body Simulator

  • Performance comparisons between Python x86, C++ Arm A72, and AI Engine N-Body Simulators

  • Effective throughput calculation (GFLOPS) vs. Theoretical peak throughput of AI Engine

Before You Begin

This tutorial can be run on the VCK190 Board (Production or ES). If you have already purchased this board, download the necessary files from the lounge and ensure you have the correct licenses installed. If you do not have a board, get in touch with your AMD sales contact.

Documentation: Explore AI Engine Architecture

Tools: Installing the Tools

  1. Obtain a license to enable beta devices in AMD tools (to use the VCK190 platform).

  2. Obtain licenses for AI Engine tools.

  3. Follow the instructions for the Vitis Software Platform Installation and ensure you have the following tools:

Environment: Setting Up Your Shell Environment

When the elements of the Vitis software platform are installed, update the shell environment script. Set the necessary environment variables to your system specific paths for xrt, platform location, and AMD tools.

  1. Edit the sample_env_setup.sh script with your file paths:

export PLATFORM_REPO_PATHS=<user-path>
export COMMON_IMAGE_VERSAL=$PLATFORM_REPO_PATHS/sw/versal/xilinx-versal-common-v<ver>
export PLATFORM=xilinx_vck190_base_<ver> #or xilinx_vck190_es1_base_<ver> is using an ES1 board
export DSPLIB_VITIS=<Path to Vitis Libs - Directory>

source $(XILINX_VITIS)/settings64.sh
source $(COMMON_IMAGE_VERSAL)/environment-setup-cortexa72-cortexa53-xilinx-linux
  1. Source the environment script:

source sample_env_setup.sh

Validation: Confirming Tool Installation

Ensure you are using the 2023.1 version of the AMD tools.

which vitis
which aiecompiler

Goals of this Tutorial

HPC Applications

The goal of this tutorial is to create a general-purpose floating point accelerator for HPC applications. This tutorial demonstrates a x24,800 performance improvement using the AI Engine accelerator over the naive C++ implementation on the A72 embedded Arm® processor.

A similar accelerator example was implemented on the AMD UltraScale+™-based Ultra96 device using only PL resources here.

Name Hardware Algorithm Complexity Average Execution Time to Simulate 12,800 Particles for 1 Timestep (seconds)
Python N-Body Simulator x86 Linux Machine O(N) 14.96
C++ N-Body Simulator A72 Embedded Arm Processor O(N2) 123.299
AI Engine N-Body SImulator Versal AI Engine IP O(N) 0.007

PL Data-Mover Kernels

Another goal of this tutorial is to showcase how to generate PL Data-Mover kernels from the AMD Vitis Utility Library. These kernels moves any amount of data from DDR buffers to AXI-Streams.

The N-Body Problem

The N-Body problem is the problem of predicting the motions of a group of N objects which each have a gravitational force on each other. For any particle i in the system, the summation of the gravitational forces from all the other particles results in the acceleration of particle i. From this acceleration, we can calculate a particle’s velocity and position (x y z vx vy vz) will be in the next timestep. Newtonian physics describes the behavior of very large bodies/particles within our universe. With certain assumptions, the laws can be applied to bodies/particles ranging from astronomical size to a golf ball (and even smaller).

12,800 Particles simulated on a 400 tile AI Engine accelerator for 300 timesteps