Buffer Port Multicasting - 2024.1 English

AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID
UG1079
Release Date
2024-06-05
Version
2024.1 English

The AI Engine compiler does not limit the buffer ports to a one-to-one connection. In certain circumstances the same output buffer can be used by other kernels to perform various tasks. You can connect a producer to as many consumers as needed. The AI Engine compiler will automatically infer a MM2S DMA to read the output buffer and as many S2MM DMAs as there are consumers to write to their respective input buffers.

private:
adf::kernel mk;
adf::kernel tk0,tk1,tk2,tk3;
...
connect net0 ( mk.out[0] , tk0.in[0] );
connect net1 ( mk.out[0] , tk1.in[0] );
connect net2 ( mk.out[0] , tk2.in[0] );
connect net3 ( mk.out[0] , tk3.in[0] );
...
dimensions(tk0.in[0]) = {128};
dimensions(tk1.in[0]) = {128};
dimensions(tk2.in[0]) = {128};
dimensions(tk3.in[0]) = {128};

Kernel function prototypes:

tk0(input_buffer<int32, adf::extents<adf::inherited_extent>> & in0, 
      output_buffer<int32, adf::extents<OUTPUT_SAMPLE_SIZE>> & out0);
tk1(input_buffer<int32, adf::extents<adf::inherited_extent>, 
                adf::margin<32>> & in0, 
        output_buffer<int32, adf::extents<OUTPUT_SAMPLE_SIZE>> & out0);
tk2(input_buffer<int32, adf::extents<adf::inherited_extent>, 
                adf::margin<64>> & in0, 
        output_buffer<int32, adf::extents<OUTPUT_SAMPLE_SIZE>> & out0);
tk3(input_buffer<int32, adf::extents<adf::inherited_extent>> & in0, 
        output_buffer<int32, adf::extents<OUTPUT_SAMPLE_SIZE>> & out0);
The input buffer to kernels tk0, tk1, tk2, and tk3 are served at the same time. This is because the output buffer of the kernel mk is read only once. The slight delay variation is only due to the different AXI4-Stream path taken to route from the maker to the various takers in the AI Engine array.
Figure 1. One Kernel Serving Four Kernels

In the code, the same kernel output is connected to four different kernel inputs. The AI Engine compiler adds DMAs between kernels so that the content of buf5(d) buffer can be copied into the other buffers using the AXI4-Stream interconnect network.