Tiling Parameters Specification - 2024.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-11-28
Version
2024.2 English
Tiling parameters are specified in the graph and associated to an input or output port of an AI Engine-ML memory DMA, Memory Tile DMA or AI Engine-ML Interface DMA. In the following example, where six tiles (4x3 samples) must be written to a 12x8 samples buffer by kernel k1, and where six tiles (7x2 samples) must be read out of this same buffer by kernel k2:
Figure 1. Write Scheme
Figure 2. Read Scheme
kernel k1, k2;
shared_buffer<int> mtx; // Memory Tile Buffer
mygraph()
{
  k1 = kernel::create(func1);
  k2 = kernel::create(func2); 

  // 12x8 samples buffer, 1 write-input, 1 read-output
  mtx = shared_buffer<int>::create({12, 8}, 1, 1); 

  connect<> n1(k1.out[0], mtx.in[0]);
  write_access(mtx.in[0]) = tiling({
    .buffer_dimension={12,8}, 
    .tiling_dimension={4,3},
    .offset={0,0}, 
    .tile_traversal = {{.dimension=1, .stride=3, .wrap=2}, 
      {.dimension=0, .stride=4, .wrap=3}}});

  connect<> n2(mtx.out[0], k2.in[0]);
  read_access(mtx.out[0]) = tiling({
    .buffer_dimension={12,8},
    .tiling_dimension={7,2},
    .offset={0,0},
    .tile_traversal = {{.dimension=0, .stride=5, .wrap=2},
      {.dimension=1, .stride=2, .wrap=3}}});
};

Within each tile the data are written with dimension 0 as the inner-loop, but the tile selection follows the tile_traversal vector specification. K1 is writing column-wise as the tile_traversal vector starts with dimension 1, which is followed by dimension 0.

K2 reads tiles row-wise as per tile_traversal vector which starts with dimension 0. Read tiles overlap in dimension 0 as the specified stride in this dimension is less than the tile size.

Note: DMA data access is based on buffer descriptors. All tiling parameters that you specify in the graph are translated into one or multiple buffer descriptor parameter sets. The tile itself added to the tile traversal parameters can require so many buffer descriptors that the compiler runs out of hardware resources (not enough BDs) and issues an error.