In the case where multiple kernels fit in a single AI Engine-ML, communication between two or more consecutive kernels can be established using a common buffer in the local data memory of the AI Engine-ML or in any of the 3 neighbor memories to which the AI Engine-ML has a direct access. In this case, only a single buffer is needed because the kernels execute one after another in a round-robin.
For cases where the kernels are in separate but neighboring AI Engine-MLs, the communication can be carried out through the data memory module shared among the two neighboring AI Engine-ML tile that use ping-pong buffers. These buffers can be on separate memory banks so access conflicts are avoided. The synchronization is done through locks. The input and output buffers for the AI Engine kernel are ensured to be ready by the locks associated with the buffers. In this type of communication, routing resources are saved and data transferring latency is eliminated because DMA and AXI4-Stream interconnect are not needed.