You might be curious about the need to implement the packet switching scheme 1:4/4:1. This is to circumvent an AI Engine architecture limitation on the number of simultaneous input and output AXI-Streams allowed per AI Engine column. There are 50 AI Engine columns in the AI Engine array. Each column contains eight AI Engine tiles. Each AI Engine column is allowed a maximum of six 32-bit AXI-Stream inputs and four 32-bit AXI-Stream outputs.
In the design, each nbody() kernel maps to an AI Engine tile. Meaning each column of eight AI Engine tiles has nine inputs streams and eight output streams. This violates these constraints.
8
w_input_iinput streams1
w_intput_jinput stream8
w_output_ioutput streams
With the 1:4/4:1 packet switching scheme, you can combine four streams into one. Because packet switching is applied on the w_input_i ports, the number of input streams into a single AI Engine column is reduced to three:
1
input_istream that goes to tiles 0-3 in a column1
input_istream that goes to tiles 4-7 in a column1
input_jstream that is broadcast to all the columns
On the output side, the number of output streams is reduced to two:
1
output_istream coming from tiles 0-3 in a column1
output_istream coming from tiles 4-7 in a column