Review the nbody_x4_x100.h
. It contains the definition of the nbodySystem
graph which contains 100 instances of the nbody_subsystem
graph. Each nbody_subsystem
is mapped to four AI Engine tiles which each contain an nbody()
kernel. Therefore, the nbodySystem
graph contains 400 nbody()
kernels using up all of the 400 available AI Engine tiles. Since each nbody()
kernel simulates 32 particles, the nbodySystem
simulates 12,800 particles (32 particles * 400 kernels). There are 100 input_i
ports (input_i0-99
) and a single input_j
port. For 1 iteration, the input_i
ports receive 4 packetized w_input_i
data which are distributed to 4 nbody()
kernels in each nbody_subsystem
graph. The input_j
is a 1:400 broadcast stream to the 400 w_input_j
ports in the 400 nbody()
kernels.
Review the nbody_x4_100.cpp
file. It contains an instance of the nbodySystem
graph and simulates it for one iteration. Also, review the data files in the data folder where you will find the input data files for the nbodySystem
(input_i0-99.txt
and input_j.txt
) used by the nbodySystem
graph.
Below is the implementation of the 100 compute unit on all 400 AI Engine tiles viewed on the AMD Vitis Analyzer tool.
The red highlighted region encompasses four AI Engine tiles which contain a single compute unit.
Following is the graph visualization of a single compute unit on the Vitis Analyzer tool.