The buffers used in the various kernels are automatically inferred at the graph level as ping-pong buffers located in the AI Engine data memory. From the kernel point of view, the buffers are accessed through the address generator units of the processor and are managed by the kernel.
Input/output buffers can be filled and flushed by other components of the device using streams that are handled internally by a DMA. In order to simplify kernel addressing, the user may require the data to be stored in a specific way in the data memory. For instance, a tensor transposition might be needed to simplify matrix multiplication at the kernel level. You can define specific access patterns for this type of memory. Defining tiling parameters enables flexible addressing for local memories.
The code snippet demonstrates the API for utilizing the AI Engine data memory to perform a matrix multiplication where matrix B has been transposed:
class MatrixMultiply : public graph {
public:
input_port inA,inB;
output_port outC;
kernel MatMult;
MatrixMultiply() {
// kernel
MatMult = kernel::create(ClassicMatMult);
source(MatMult) = "src/matmult.cpp";
runtime<ratio>(MatMult) = 0.9;
// Connect Input A to MatMult Kernel
connect(inA, MatMult.in[0]);
dimensions(MatMult.in[0]) = {NColsA,NRowsA};
// Connect Input B to MatMult Kernel
connect(inB, MatMult.in[1]);
dimensions(MatMult.in[1]) = {NRowsB,NColsB};
/* tiling parameters to transpose matrix */
write_access(MatMult.in[1]) = adf::tiling(...);
// Connect MatMult Kernel to Output
connect(MatMult.out[0],outC);
dimensions(MatMult.out[0]) = {NColsC,NRowsC};
};
};
dimensions(MatMult.in[0]) = {NColsA,NRowsA};
This size definition can also be set directly within the kernel.