AI Engine-ML Local Memory Access - 2024.2 English

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-11-28
Version
2024.2 English

The buffers used in the various kernels are inferred automatically at the graph level as ping-pong buffers located in the AI Engine-ML data memory. From the kernel point of view, the buffers are accessed through the address generator units of the processor and are managed by the kernel.

Input/output buffers can be filled and flushed by other parts of the device through streams handled internally by a DMA. In order to simplify kernel addressing, the user might require the data to be stored in a specific way in the data memory. It could be a kind of transposition of a tensor to simplify matrix multiplication at the kernel level. You cannot currently define specific access patterns for this kind of memory, no tiling parameter can be defined leading to a simple linear addressing for local memories. If a specific ordering is needed in the local memory, this must be taken into account by the transmitter. Similarly, when the local memory is flushed, the destination must take into account the specific data ordering of the local memory.

The following code snippet shows the API for the AI Engine-ML data memory:
class MatrixMultiply : public graph {
public:
input_port inA,inB;
output_port outC;

kernel MatMult;

MatrixMultiply() {

	// kernel
	MatMult = kernel::create(ClassicMatMult);
	source(MatMult) = "src/matmult.cpp";
	runtime<ratio>(MatMult) = 0.9;
	
	// Connect Input A to MatMult Kernel
	connect(inA, MatMult.in[0]);
	dimensions(MatMult.in[0]) = {NColsA,NRowsA};
	
	// Connect Input B to MatMult Kernel
	connect(inB, MatMult.in[1]);
	dimensions(MatMult.in[1]) = {NColsB,NRowsB};
	
	// Connect MatMult Kernel to  Output
	connect(MatMult.out[0],outC);
	dimensions(MatMult.out[0]) = {NColsC,NRowsC};
};
};
The AI Engine compiler reads connections in the constructor to infer the right number of buffers at the input and the output ports of the kernel. The size of these buffers are parameterized by:
dimensions(MatMult.in[0]) = {NColsA,NRowsA};

This size definition can also be done directly inside the kernel.