This corner-turning data flow can be implemented directly by the local DMA hardware in each AI Engine tile. Essentially, the input stream DMA can be programmed directly as outlined above to write the local input buffer “row-wise” using something referred to as a “tiling parameter” shown below. The AI Engine can then read this input buffer “column-wise” when performing its compute operation channel-by-channel, storing its results “column-wise” in its output buffer. Finally, the output stream DMA may be programmed to read the output buffer “row-wise” to restore the TDM nature to the output data stream. The corner-turning performed at both input & output buffers of the mixer takes no compute resources from the AI Engine core since the addressing is computed by the local tile DMA hardware; essentially it becomes part of the data flow.