Buffer Port Multicasting - 2025.2 English - UG1079

AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID
UG1079
Release Date
2025-11-26
Version
2025.2 English

The AI Engine compiler does not limit the buffer ports to a one-to-one connection. Under certain conditions, multiple kernels can share the same output buffer to perform various tasks. You can connect a producer to as many consumers as needed. The AI Engine compiler automatically infers a MM2S DMA to read the output buffer. The compiler also infers as many S2MM DMAs as there are consumers to write to their respective input buffers.

private:
adf::kernel mk;
adf::kernel tk0,tk1,tk2,tk3;
...
connect net0 ( mk.out[0] , tk0.in[0] );
connect net1 ( mk.out[0] , tk1.in[0] );
connect net2 ( mk.out[0] , tk2.in[0] );
connect net3 ( mk.out[0] , tk3.in[0] );
...
dimensions(tk0.in[0]) = {128};
dimensions(tk1.in[0]) = {128};
dimensions(tk2.in[0]) = {128};
dimensions(tk3.in[0]) = {128};

Kernel function prototypes:

tk0(input_buffer<int32, adf::extents<adf::inherited_extent>> & in0, 
      output_buffer<int32, adf::extents<OUTPUT_SAMPLE_SIZE>> & out0);
tk1(input_buffer<int32, adf::extents<adf::inherited_extent>, 
                adf::margin<32>> & in0, 
        output_buffer<int32, adf::extents<OUTPUT_SAMPLE_SIZE>> & out0);
tk2(input_buffer<int32, adf::extents<adf::inherited_extent>, 
                adf::margin<64>> & in0, 
        output_buffer<int32, adf::extents<OUTPUT_SAMPLE_SIZE>> & out0);
tk3(input_buffer<int32, adf::extents<adf::inherited_extent>> & in0, 
        output_buffer<int32, adf::extents<OUTPUT_SAMPLE_SIZE>> & out0);
The input buffer to kernels tk0, tk1, tk2, and tk3 are served at the same time. This is because the output buffer of the kernel mk is read only once. The delay is due to the different AXI4-Stream path taken to route from the maker to the takers in the AI Engine array.
Figure 1. One Kernel Serving Four Kernels

The code connects the same kernel output to four different kernel inputs. The AI Engine compiler adds DMAs between kernels to copy the buf5(d) buffer content to other buffers using the AXI4-Stream interconnect network.