Latency Performance Comparisons - Latency Performance Comparisons - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

The following table compares the execution times to simulate 12,800 particles for one timestep on the various N-Body simulators explored in this tutorial.

Name

Hardware

Algorithm

Average Execution Time for 1 Timestep (seconds)

Python NBody Simulator

x86 Linux Machine

O(N)

14.96

C++ NBody Simulator

A72 Embedded Arm Processor

O(N2)

121.295

AI Engine NBody Simulator

Versal AI Engine IP

O(N)

0.00888979

As you can see, the N-Body Simulator implemented on the AI Engine offers a x2,800 improvement over the Python O(N) implementation. It also offers a x24,800 improvement over the C++ O(N2) implementation. You can use pthreads to create a vectorized C++ NBody Simulator O(N) implementation, but this is not included in this tutorial.