AI Engine-ML Local Memory Access - 2025.1 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-05-29
Version
2025.1 English

The buffers used in the various kernels are inferred automatically at the graph level as ping-pong buffers located in the AI Engine-ML data memory. From the kernel point of view, the buffers are accessed through the address generator units of the processor and are managed by the kernel.

Input/output buffers can be filled and flushed by other parts of the device through streams handled internally by a DMA. In order to simplify kernel addressing, the user might require the data to be stored in a specific way in the data memory. It could be a kind of transposition of a tensor to simplify matrix multiplication at the kernel level. You can define specific access patterns for this kind of memory. Tiling parameters can be defined for local memories.

The following code snippet shows the API for the AI Engine-ML
class MatrixMultiply : public graph {
public:
input_port inA,inB;
output_port outC;

kernel MatMult;

MatrixMultiply() {

	// kernel
	MatMult = kernel::create(ClassicMatMult);
	source(MatMult) = "src/matmult.cpp";
	runtime<ratio>(MatMult) = 0.9;
	
	// Connect Input A to MatMult Kernel
	connect(inA, MatMult.in[0]);
	dimensions(MatMult.in[0]) = {NColsA,NRowsA};
	
	// Connect Input B to MatMult Kernel
	connect(inB, MatMult.in[1]);
	dimensions(MatMult.in[1]) = {NRowsB,NColsB};
	/* tiling parameters to transpose matrix */
        write_access(MatMult.in[1]) = adf::tiling(...);

	// Connect MatMult Kernel to  Output
	connect(MatMult.out[0],outC);
	dimensions(MatMult.out[0]) = {NColsC,NRowsC};
};
};
The AI Engine compiler reads connections in the constructor to infer the correct number of buffers at the input and the output ports of the kernel. The size of these buffers is parameterized by:
dimensions(MatMult.in[0]) = {NColsA,NRowsA};

Size definition can also be done directly inside the kernel.