The following figure shows the software scheduling of the polyphase filterbank design. Each tile implements the filtering for two physical channels, in this case “A” and “B”. The stream inputs collect four samples over four cycles, alternately for each channel. Similarly, the compute is performed alternately over two cycles for each channel. The output results are then produced alternately on the output stream over another four cycles. This loop is scheduled with II=8 to achieve the desired throughput.
From the compute gaps in the following figure and the fact that each AI Engine tile contains not one but two I/O streams, raises the question as to why do we use eight tiles for this design when perhaps only four are required from a compute bound perspective? Although the AI Engine supports two input and two output streams, a VLIW hardware restriction limits their use to either (i) two inputs and one output or (ii) one input and two outputs, or (iii) one input and one output. It was not feasible to schedule an II=8 loop supporting four filters in a single tile.