Tiling Parameters Specification - 2025.1 English - UG1079

AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID
UG1079
Release Date
2025-06-04
Version
2025.1 English

Tiling parameters are specified in the graph and associated to an input or output port of an AI Engine memory DMA. In the following example, where six tiles (4x3 samples) must be written to a 12x8 sample buffer by kernel k1, and where six tiles (7x2 samples) must be read out of the same buffer by kernel k2:

Figure 1. Write Scheme
Figure 2. Read Scheme
kernel k1, k2;

mygraph()
{
  k1 = kernel::create(func1);
  k2 = kernel::create(func2); 

  connect n1(k1.out[0], k2.in[0]);

  read_access(k1.out[0]) = tiling({
    .buffer_dimension={12,8}, 
    .tiling_dimension={4,3},
    .offset={0,0}, 
    .tile_traversal = {{.dimension=1, .stride=3, .wrap=2}, 
      {.dimension=0, .stride=4, .wrap=3}}});

  read_access(k2.in[0]) = tiling({
    .buffer_dimension={12,8},
    .tiling_dimension={7,2},
    .offset={0,0},
    .tile_traversal = {{.dimension=0, .stride=5, .wrap=2},
      {.dimension=1, .stride=2, .wrap=3}}});
};

With this construct, the two kernels communicate through two ping-pong buffers. A DMA transfer is automatically added by the AI Engine compiler between a ping-pong output buffer on the k1 side and a ping-pong input buffer on the k2 side. Within each tile the data are read/written with dimension 0 as the inner-loop, but the tile selection follows the tile_traversal vector specification.

On the k1 side the MM2S DMA accesses the tiles column-wise as per the tile_traversal vector starts with dimension 1, which is followed by dimension 0.

On the k2 side the S2MM DMA accesses the tiles row-wise as per the tile_traversal vector which starts with dimension 0. The read access overlaps in dimension 0 as the specified stride in this dimension is less than the tile size.

Note: DMA data access is based on buffer descriptors. All tiling parameters that you specify in the graph are translated into one or multiple buffer descriptor parameter sets. The tile itself added to the tile traversal parameters can require many buffer descriptors. The compiler could run out of hardware resources and issue an error:
Failed to allocate buffer descriptors for TG.G1.mtxin due to insufficient number of available buffer descriptors.