Tiling parameters are specified in the graph and associated to an input or
output port of an AI Engine-ML memory DMA, Memory Tile DMA or AI Engine-ML Interface
DMA. In the following example, where six tiles (4x3 samples) must be written to a
12x8 samples buffer by kernel k1, and where six tiles (7x2 samples) must be read out
of this same buffer by kernel k2:
Figure 1. Write Scheme
Figure 2. Read Scheme
kernel k1, k2;
shared_buffer<int> mtx; // Memory Tile Buffer
mygraph()
{
k1 = kernel::create(func1);
k2 = kernel::create(func2);
// 12x8 samples buffer, 1 write-input, 1 read-output
mtx = shared_buffer<int>::create({12, 8}, 1, 1);
connect<> n1(k1.out[0], mtx.in[0]);
write_access(mtx.in[0]) = tiling({
.buffer_dimension={12,8},
.tiling_dimension={4,3},
.offset={0,0},
.tile_traversal = {{.dimension=1, .stride=3, .wrap=2},
{.dimension=0, .stride=4, .wrap=3}}});
connect<> n2(mtx.out[0], k2.in[0]);
read_access(mtx.out[0]) = tiling({
.buffer_dimension={12,8},
.tiling_dimension={7,2},
.offset={0,0},
.tile_traversal = {{.dimension=0, .stride=5, .wrap=2},
{.dimension=1, .stride=2, .wrap=3}}});
};
Within each tile the data are written with dimension 0 as the
inner-loop, but the tile selection follows the tile_traversal
vector specification. K1 is writing column-wise as the
tile_traversal
vector starts with dimension 1,
which is followed by dimension 0.
K2 reads tiles row-wise as per tile_traversal
vector which starts with dimension 0. Read tiles
overlap in dimension 0 as the specified stride in this dimension is less than the
tile size.
Note: DMA data access is based on buffer
descriptors. All tiling parameters that you specify in the graph are translated into
one or multiple buffer descriptor parameter sets. The tile itself added to the tile
traversal parameters can require so many buffer descriptors that the compiler runs
out of hardware resources (not enough BDs) and issues an error.