Next, review the src/nbody_subsystem.h graph. This graph creates four N-Body kernels, a packet splitter kernel, and a packet merger kernel. Review the packet switching feature tutorial to learn more about the packet switching feature in the AI Engine: 04-packet-switching.
The nbody_subsystem graph has two inputs: input_i and input_j. The input_i port is a packet stream that connects to the packet splitter. The packet splitter redirects packets of data to the w_input_i port of each nbody() kernel. Each input_i packet contains a packet header, 224 32-bit data values, and TLAST asserted with the m31 data value. The input_j port is a data stream that is broadcast to all the nbody() kernels (i.e., all nbody() kernels receive the same input_j data). The nbody() kernels perform their computations and generate the new w_output_i data which is merged into a single stream of packets, resulting in the output of the nbody_subsystem graph output_i.
| Name | Number of 32-bit Data Values | Window Size (bytes) |
|---|---|---|
| input_i | 224 * 4 = 896 | 896 * 4 = 3584 bytes |
| input_j | 128 | 128 * 4 = 512 bytes |
| output_i | 224 * 4 = 896 | 896 * 4 = 3584 bytes |
A single instance of the nbody_subsystem graph can simulate 128 particles using four AI Engine tiles.