As written above, tiling parameterization can be associated to each port that uses a DMA. If the access scheme is linear on the overall buffer, the tiling specification is optional. Tiling parameterization on both ends of a connection is subject to a single constraint: the overall data transfer volume should be the same on both ends.
In the following example, tiling parameterization is required at the following levels:
- External Memory (DDR) (known as
external_buffer
in the graph) -
AI Engine-ML
Memory Tile (MEM Tile) (known as
shared_buffer
in the graph)
The application is parameterized with the following values:
-
NITERATIONS
- Number of iterations handled by the main host code.
-
NPARTS
- The global data set in the DDR is divided into
NPARTS
sections which are sent to the MEM Tiles. -
NFRAMES
- Each section is divided in
NFRAMES
sub-sections which are sent to the AI Engine Memory. -
NVECTORS
- Each sub-section is split into
NVECTORS
vectors which are handled by the kernel in a single run. -
VECTOR_LENGTH
- Size of the basic vector processed by the kernels.
For this application, NFRAME
,
NVECTORS
and VECTOR_LENGTH
are specific to each kernel (K1, K2). The file tiling_parameters.h groups all tiling parameters:
// Complete Dataset sizes
const uint32_t totalSize1 = NITERATIONS*NPARTS*NFRAMES_1*NVECTORS_1*VECTOR_LENGTH_1;
const uint32_t totalSize2 = NITERATIONS*NPARTS*NFRAMES_2*NVECTORS_2*VECTOR_LENGTH_2;
// Input and Output DDR size for all dimensions
const std::vector<uint32_t> ddr_size1 = { VECTOR_LENGTH_1*NVECTORS_1, NFRAMES_1, NPARTS};
const std::vector<uint32_t> ddr_size2 = { VECTOR_LENGTH_2*NVECTORS_2, NFRAMES_2, NPARTS};
// MEM Tile data size
const std::vector<uint32_t> shared_mem_size1 = {VECTOR_LENGTH_1,NVECTORS_1, NFRAMES_1};
const std::vector<uint32_t> shared_mem_size2 = {VECTOR_LENGTH_2,NVECTORS_2, NFRAMES_2};
// AI Engine-ML buffer size
const std::vector<uint32_t> buffer_size1 = {VECTOR_LENGTH_1,NVECTORS_1};
const std::vector<uint32_t> buffer_size2 = {VECTOR_LENGTH_2,NVECTORS_2};
// Parameter given to the kernels
const int LoopSize_1 = VECTOR_LENGTH_1*NVECTORS_1;
const int LoopSize_2 = VECTOR_LENGTH_2*NVECTORS_2;
// Number of times a kernel should be run for each iteration
const uint32_t bufferRepetition_1 = NFRAMES_1*NPARTS;
const uint32_t bufferRepetition_2 = NFRAMES_2*NPARTS;
// Tiling Parameter for the input and output DDR
adf::tiling_parameters DDR_pattern1 = {
.buffer_dimension=ddr_size1,
.tiling_dimension={VECTOR_LENGTH_1*NVECTORS_1, NFRAMES_1,1},
.offset={0,0,0},
.tile_traversal={{.dimension=2, .stride=1, .wrap=NPARTS}}
};
adf::tiling_parameters DDR_pattern2 = {
.buffer_dimension=ddr_size2,
.tiling_dimension={VECTOR_LENGTH_2*NVECTORS_2, NFRAMES_2,1},
.offset={0,0,0},
.tile_traversal={{.dimension=2, .stride=1, .wrap=NPARTS}}
};
// Tiling Parameter for MEM Tiles
adf::tiling_parameters MEM_pattern1 = {
.buffer_dimension=shared_mem_size1,
.tiling_dimension={VECTOR_LENGTH_1,NVECTORS_1, 1},
.offset={0,0,0},
.tile_traversal={{.dimension=2, .stride=1, .wrap=NFRAMES_1}}
};
adf::tiling_parameters MEM_pattern2 = {
.buffer_dimension=shared_mem_size2,
.tiling_dimension={VECTOR_LENGTH_2,NVECTORS_2, 1},
.offset={0,0,0},
.tile_traversal={{.dimension=2, .stride=1, .wrap=NFRAMES_2}}
};
The graph construction will use all these tiling parameters to parameterize input and output DMAs. When defining the kernels K1 and K2, the number of runs of these kernels for each iteration must be specified:
class transfer_control : public graph {
public:
kernel K1,K2;
external_buffer<uint32> ddrin,ddrout;
shared_buffer<uint32> mtx1,mtx2,mtx3;
transfer_control() {
// kernels
K1 = kernel::create_object<PassThrough>(1,LoopSize_1);
source(K1) = "src/passthrough.cpp";
headers(K1) = {"src/passthrough.h"};
runtime<ratio>(K1) = 0.9;
repetition_count(K1) = bufferRepetition_1;
K2 = kernel::create_object<PassThrough>(2,LoopSize_2);
source(K2) = "src/passthrough.cpp";
headers(K2) = {"src/passthrough.h"};
runtime<ratio>(K2) = 0.9;
repetition_count(K2) = bufferRepetition_2;
…
};
};
The external buffers and shared buffers are then created. The repetition_count
method is used again to define the
number of frame read/write per iteration:
// External Buffers (in External Memory)
// Size, number of input ports, number of output ports
ddrin = external_buffer<uint32>::create(ddr_size1, 0, 1);
ddrout = external_buffer<uint32>::create(ddr_size2, 1, 0);
// Shared Buffers (in Memory Tiles)
// Size, number of input ports, number of output ports
mtx1 = shared_buffer<uint32_t>::create(shared_mem_size1,1,1);
repetition_count(mtx1) = NPARTS;
mtx2 = shared_buffer<uint32_t>::create(shared_mem_size2,1,1);
repetition_count(mtx2) = NPARTS;
mtx3 = shared_buffer<uint32_t>::create(shared_mem_size2,1,1);
repetition_count(mtx3) = NPARTS;
// Shared buffers support ping-pong buffering
num_buffers(mtx1) = 2;
num_buffers(mtx2) = 2;
num_buffers(mtx3) = 2;
The AI Engine-ML memory creation process is done automatically with the connection to other elements:
// Connect Input DDR to Input MEM Tile
connect(ddrin.out[0], mtx1.in[0]);
// Specify read access pattern for the data source and the write access pattern for the destination for the DDR --> MEM Tile connection
read_access(ddrin.out[0]) = DDR_pattern1;
write_access(mtx1.in[0]) = MEM_pattern1;
// Kernel 1 connection
connect(mtx1.out[0], K1.in[0]);
// Access pattern can be defined only on the Shared buffer (Memory Tile) within the graph
read_access(mtx1.out[0]) = MEM_pattern1;
dimensions(K1.in[0]) = buffer_size1;
connect(K1.out[0], mtx2.in[0]);
dimensions(K1.out[0]) = buffer_size1;
// Access pattern can be defined only on the Shared buffer (Memory Tile) within the graph
write_access(mtx2.in[0]) = MEM_pattern1;
// Kernel 2 connection
connect(mtx2.out[0], K2.in[0]);
// Access pattern can be defined only on the Shared buffer (Memory Tile) within the graph
read_access(mtx2.out[0]) = MEM_pattern2;
dimensions(K2.in[0]) = buffer_size2;
connect(K2.out[0], mtx3.in[0]);
dimensions(K2.out[0]) = buffer_size2;
// Access pattern can be defined only on the Shared buffer (Memory Tile) within the graph
write_access(mtx3.in[0]) = MEM_pattern2;
// Connect Output MEM Tile to Output DDR
connect(mtx3.out[0], ddrout.in[0]);
// Specify read access pattern for the data source and the write access pattern for the destination for the MEM Tile connection --> DDR
read_access(mtx3.out[0]) = MEM_pattern2;
write_access(ddrout.in[0]) = DDR_pattern2;
On top of the tiling parameters for all inputs and outputs of shared buffers and external buffers, the buffer size of the kernels are also specified within the graph.