Let’s decompose the tiling parameters for the read access:
| BD Dimension | stride | wrap | Comment |
|---|---|---|---|
| 0 | 1 | 4 | |
| 1 | 8 | 4 | Read a 4x4 tile |
| 2 | 4 | 2 | |
| 3 | 32 | 2 | Read the 2x2 tile structure |
| 4 | 64 | 16 | Read the 16 layers |
On BD dimension 3 we can see that stride x wrap = 32x2 = 64 which is the stride of the next dimension. The compiler automatically optimizes into:
| BD Dimension | stride | wrap | Comment |
|---|---|---|---|
| 0 | 1 | 4 | |
| 1 | 8 | 4 | Read a 4x4 tile |
| 2 | 4 | 2 | Read 2 tiles horizontally |
| 3 | 32 | 32 | Read 32 times these 2 horiontal tiles structure |
In the memory tile, 1 BD handles 4 stride values and 3 wraps. The last wrap value translate into 32 BDs which is above the 23 (24 - 1 for the write access) BDs available in the memory tile.
Uncomment the following lines:
D23 = 12;
adf::location<adf::dma>(mtxin.in[0]) = adf::dma_channel(adf::memory_tile,COL,0,0);
adf::location<adf::dma>(mtxin.out[0]) = adf::dma_channel(adf::memory_tile,COL,0,1);
The first line limits the number of layer to 12 which leads to 24 BD used which is the maximum available for odd or even DMA channel. To get a successful compilation MM2S and S2MM should have a different parity index, this is the purpose of the 2 last lines.
Recompile, it will be successful.