Tiling parameters are specified in the graph and associated to an input or output port of an AI Engine memory DMA. In the following example, where six tiles (4x3 samples) must be written to a 12x8 sample buffer by kernel k1, and where six tiles (7x2 samples) must be read out of the same buffer by kernel k2:
kernel k1, k2;
mygraph()
{
k1 = kernel::create(func1);
k2 = kernel::create(func2);
connect n1(k1.out[0], k2.in[0]);
read_access(k1.out[0]) = tiling({
.buffer_dimension={12,8},
.tiling_dimension={4,3},
.offset={0,0},
.tile_traversal = {{.dimension=1, .stride=3, .wrap=2},
{.dimension=0, .stride=4, .wrap=3}}});
read_access(k2.in[0]) = tiling({
.buffer_dimension={12,8},
.tiling_dimension={7,2},
.offset={0,0},
.tile_traversal = {{.dimension=0, .stride=5, .wrap=2},
{.dimension=1, .stride=2, .wrap=3}}});
};
With this construct, the two kernels communicate through two ping-pong
buffers. A DMA transfer is automatically added by the AI Engine compiler between a ping-pong output buffer on the
k1 side and a ping-pong input buffer on the
k2 side. Within each tile the data are read/written with
dimension 0 as the inner-loop, but the tile selection follows the tile_traversal
vector specification.
On the k1 side the MM2S DMA accesses the tiles
column-wise as per the tile_traversal
vector starts
with dimension 1, which is followed by dimension 0.
On the k2 side the S2MM DMA accesses the tiles
row-wise as per the tile_traversal
vector which starts
with dimension 0. The read access overlaps in dimension 0 as the specified stride in
this dimension is less than the tile size.
Failed to allocate buffer descriptors for TG.G1.mtxin due to insufficient number of available buffer descriptors.