Next, review the src/nbody_subsystem.h graph. This graph creates four N-Body kernels, a packet splitter kernel, and a packet merger kernel. Review the packet switching feature tutorial to learn more about the packet switching feature in the AI Engine: 04-packet-switching.
The nbody_subsystem graph has two inputs: input_i and input_j. The input_i port is a packet stream that connects to the packet splitter. The packet splitter redirects packets of data to the w_input_i port of each nbody() kernel. Each input_i packet contains a packet header, 224 32-bit data values, and TLAST asserted with the m31 data value. The input_j port is a data stream that broadcasts to all the nbody() kernels (that is, all nbody() kernels receive the same input_j data). The nbody() kernels perform their computations and generate the new w_output_i data which merges into a single stream of packets, resulting in the output of the nbody_subsystem graph output_i.
Name |
Number of 32-bit Data Values |
Window Size (bytes) |
|---|---|---|
input_i |
224 * 4 = 896 |
896 * 4 = 3584 bytes |
input_j |
128 |
128 * 4 = 512 bytes |
output_i |
224 * 4 = 896 |
896 * 4 = 3584 bytes |
A single instance of the nbody_subsystem graph can simulate 128 particles using four AI Engine tiles.