Workload Distribution and input_j - Workload Distribution and input_j - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

To calculate the N-Body gravity equations for 128 particles, each nbody() kernel calculates the N-Body gravity equations for 32 particles. However, to calculate acceleration and the new velocities, an nbody() kernel needs to know the data in the other kernels. For example, if particle 0 is mapped to nbody_kernel[0] and particle 32 is mapped to nbody_kernel[1]. Then nbody_kernel[0] needs to know the data in nbody_kernel[1] to accurately calculate the summation equation for acceleration, and then calculate the new velocity of particle 0.

This is where the input_j stream plays a vital role in data sharing. Even though the input_j data stream has a window size for 32 particles worth of data, the LOOP_COUNT_J value can be set to allow the nbody() kernels to take in any number of 32 particles worth of data at a time. For a single instance of the nbody_subsystem graph, the LOOP_COUNT_J should be set to 4 to stream in data for all four kernels. For the final AI Engine graph, which contains 100 instances of the nbody_subsystem graph, the LOOP_COUNT_J value is set to 400 to stream in data for all 400 kernels to each nbody() kernel.

alt text

For example, to calculate the new velocity of particle 0 mapped in nbody_kernel[0], the nbody_kernel[0] can retrieve the data value of particle 32 from the input_j stream. This way, all nbody() kernels will have the data values for all other particles mapped in the other nbody() kernels through data streaming from input_j.