Tiling parameters are specified in the graph and associated to an input or output port of an AI Engine-ML memory DMA, Memory Tile DMA or AI Engine-ML Interface DMA. In the following example, where six tiles (4x3 samples) must be written to a 12x8 samples buffer by kernel k1, and where six tiles (7x2 samples) must be read out of this same buffer by kernel k2:
kernel k1, k2;
shared_buffer<int> mtx; // Memory Tile Buffer
mygraph()
{
k1 = kernel::create(func1);
k2 = kernel::create(func2);
// 12x8 samples buffer, 1 write-input, 1 read-output
mtx = shared_buffer<int>::create({12, 8}, 1, 1);
connect<> n1(k1.out[0], mtx.in[0]);
write_access(mtx.in[0]) = tiling({
.buffer_dimension={12,8},
.tiling_dimension={4,3},
.offset={0,0},
.tile_traversal = {{.dimension=1, .stride=3, .wrap=2},
{.dimension=0, .stride=4, .wrap=3}}});
connect<> n2(mtx.out[0], k2.in[0]);
read_access(mtx.out[0]) = tiling({
.buffer_dimension={12,8},
.tiling_dimension={7,2},
.offset={0,0},
.tile_traversal = {{.dimension=0, .stride=5, .wrap=2},
{.dimension=1, .stride=2, .wrap=3}}});
};
Within each tile the data are written with dimension 0 as the inner-loop,
but the tile selection follows the tile_traversal vector
specification. K1 is writing column-wise as the tile_traversal vector starts with dimension 1, which is followed by dimension
0.
K2 reads tiles row-wise as per tile_traversal vector which starts with dimension 0. Read tiles overlap in
dimension 0 as the specified stride in this dimension is less than the tile size.