The buffers used in the various kernels are inferred automatically at the graph level as ping-pong buffers located in the AI Engine-ML data memory. From the kernel point of view, the buffers are accessed through the address generator units of the processor and are managed by the kernel.
Input/output buffers can be filled and flushed by other parts of the device through streams handled internally by a DMA. In order to simplify kernel addressing, the user might require the data to be stored in a specific way in the data memory. It could be a kind of transposition of a tensor to simplify matrix multiplication at the kernel level. You can define specific access patterns for this kind of memory. Tiling parameters can be defined for local memories.
class MatrixMultiply : public graph {
public:
input_port inA,inB;
output_port outC;
kernel MatMult;
MatrixMultiply() {
// kernel
MatMult = kernel::create(ClassicMatMult);
source(MatMult) = "src/matmult.cpp";
runtime<ratio>(MatMult) = 0.9;
// Connect Input A to MatMult Kernel
connect(inA, MatMult.in[0]);
dimensions(MatMult.in[0]) = {NColsA,NRowsA};
// Connect Input B to MatMult Kernel
connect(inB, MatMult.in[1]);
dimensions(MatMult.in[1]) = {NRowsB,NColsB};
/* tiling parameters to transpose matrix */
write_access(MatMult.in[1]) = adf::tiling(...);
// Connect MatMult Kernel to Output
connect(MatMult.out[0],outC);
dimensions(MatMult.out[0]) = {NColsC,NRowsC};
};
};
dimensions(MatMult.in[0]) = {NColsA,NRowsA};
Size definition can also be done directly inside the kernel.