100 N-Body Subsystems - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID

XD100

Release Date

2024-03-05

Version

2023.2 English

Review the nbody_x4_x100.h. It contains the definition of the nbodySystem graph which contains 100 instances of the nbody_subsystem graph. Each nbody_subsystem is mapped to four AI Engine tiles which each contain an nbody() kernel. Therefore, the nbodySystem graph contains 400 nbody() kernels using up all of the 400 available AI Engine tiles. Since each nbody() kernel simulates 32 particles, the nbodySystem simulates 12,800 particles (32 particles * 400 kernels). There are 100 input_i ports (input_i0-99) and a single input_j port. For 1 iteration, the input_i ports receive 4 packetized w_input_i data which are distributed to 4 nbody() kernels in each nbody_subsystem graph. The input_j is a 1:400 broadcast stream to the 400 w_input_j ports in the 400 nbody() kernels.

Review the nbody_x4_100.cpp file. It contains an instance of the nbodySystem graph and simulates it for one iteration. Also, review the data files in the data folder where you will find the input data files for the nbodySystem (input_i0-99.txt and input_j.txt) used by the nbodySystem graph.

alt text

Below is the implementation of the 100 compute unit on all 400 AI Engine tiles viewed on the AMD Vitis Analyzer tool. alt text

The red highlighted region encompasses four AI Engine tiles which contain a single compute unit.

Following is the graph visualization of a single compute unit on the Vitis Analyzer tool. alt text