The main idea is to employ the AIE-ML memory tiles ability to perform matrix transposition implicitly while writing and reading the data, through the programmable access pattern set with tiling parameters. In this way it is possible to perform temporal and spatial interleaving on the input samples before sending them to their respective kernels. To do so, consider the input data as a matrix where the rows represent the n-th samples of every instance, and the columns represent the instances. A visualization of such a matrix is shown in the following table, where the instances \(i\) go from 0 to \(I\), while the samples \(n\) go from 0 to \(N\).
Table 3: Memory tile Buffering Matrix.
In this way, it is possible to write inside the memory tile row-by-row with the inputs coming from every instance at their sample rate of 125MSa/s, so that a row of the memory tile will be filled at every sample cycle. Eventually, all the columns of the matrix will be filled when the last sample of every instance will be written to the memory tile, i.e. the last row of the matrix, at the last sampling cycle. Now, the memory tile reading operation done by the kernels can proceed in the opposite way, column-by-column. In this way the kernel’s buffers will be filled with the all the samples in a contiguous way, as needed. To apply time interleaving to such matrix, Table 2 and Table 3 need to be merged. To do so, observe that the rows of Table 2 are filled at a rate \(r=125MSa/s\), while the rows of Table 2 are filled every $f_{PL}=500MHz\(, thus Table 3 is filled with an rate equal to the sample rate \)r$ for each row, because the wanted temporal interleaving makes the PL rate $\theta$ times bigger than \(r\). This means that Table 3 can be folded $\theta$ times to form a three-dimensional tensor, where each vertical slice contains all the time interleaved samples with the order of Table 2, while each horizontal slice is composed of contiguous instances spanning though rows, and contiguous samples spanning though columns, as shown in the following animation.
Animation 1: Memory tile 3D buffering mechanism.