Multi-dimensional Addressing in AI Engine Kernels - 2025.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-11-26
Version
2025.2 English
Note: For this section, a tensor is a collection of vectors of a specific type, called the 'base type'. Thus, a 1-D tensor in this section is actually 2-D in the traditional context because each tensor component is actually a vector. The following examples show this.

The AI Engine APIs support linear addressing in AI Engine kernels. The addresses can be adjusted by using a pointer or by using arithmetic operations on iterators.

Multi-dimensional addressing is supported in AIE-ML and AIE-ML v2 devices. An aie::tensor_descriptor object is used to map a multidimensional tensor to a 1-D memory space. It is created from a vector base type and a list of aie::tensor_dim pairs formed from the tensor size (the number of vectors) in each dimension, and a step parameter indicating how the next component of the tensor is obtained. This representation allows reiteration over a sub-volume of the tensor by adding an extra aie::tensor_dim pair with a step of zero and the size set to the desired number of iterations.

The AI Engine API introduces tensor buffer streams to support multi-dimensional addressing inside a kernel. The tensor buffer streams are created using aie::make_tensor_buffer_stream, and can be advanced by the >> operator or the pop() member function.

aie::make_tensor_buffer_stream accepts two parameters: a pointer to the raw data and a tensor descriptor, which describes how data is transferred to the stream. Each time the stream advances, it reads and returns a vector of the base type. The tensor descriptors and associated buffer streams can be composed to arbitrary dimensions, although the underlying mechanisms are built on three-dimensional abstractions. The tensor buffer streams are recursively defined, decomposing an N-dimensional tensor into (N-1)/3 nested streams, with a final N % 3 leaf stream. Accessing an inner stream requires reading the containing outer stream with a pop() call, which advances the outer stream and returns the inner stream. For example, a tensor descriptor and a tensor buffer stream are created as follows:

#include "aie_api/aie.hpp"
#include "aie_api/aie_adf.hpp"
#include "aie_api/utils.hpp"

constexpr unsigned vlen = 4;                    // vector length of base type
constexpr unsigned tsize = 8;                   // no. of tensor components for this dimension
constexpr unsigned tstep = 2;                   // no. of tensor components to step over
constexpr unsigned N = vlen * tsize * tstep;    // size of data buffer

using dtype = int32;    // data type of base type and buffer

void tbuff() {

    // declare data buffer
    alignas(aie::vector_decl_align) dtype buff[N];

    // initialize buffer contents
	for (unsigned i = 0u; i < N; i++) {
        buff[i] = i;
    }

    // tensor descriptor with base type: <dtype, vlen>
    // tensor dimensions: (tsize, tstep)
	auto desc = aie::make_tensor_descriptor<dtype, vlen>(aie::tensor_dim(tsize, tstep));

    // create tensor buffer stream associating buff with desc
	auto tbs = aie::make_tensor_buffer_stream(buff, desc);

    // show the contents of the tensor buffer stream
    aie::vector<dtype, vlen> v; // vector same as base type
	for (unsigned i = 0u; i < N/(vlen * tstep); i++) {
    	v = tbs.pop(); // "tbs >> v" may also be used
    	printf("i = %d:\n  ", i);
    	aie::print(v, true, "v = ");
	}

} // end tbuff()

The preceding example code shows that the base type of the stream is aie::vector<int32, 4>. The addressing is specified as aie::tensor_dim(8, 2). This implies that there are eight vectors of the base type for this dimension, and the step value of 2 specifies that vectors with even indices are selected. The starting point for the step is the beginning of the associated data buffer.

The addressing of the data buffer goes from the lower to the higher dimensions of aie::tensor_dim. The first pair denotes the lowest dimension.

Note: Each element in the following examples is a vector specified in aie::make_tensor_desciptor.

Dimension 0

Addressing starts from the first component in the data buffer (at index = 0), and advances by the step value (step0). After it advances by the specified size, addressing moves to the next dimension, when available.

Note: The tools do not check for out-of-bounds access
Figure 1. Dimension 0

For the above code sample, the data buffer contents can be visualized as follows:

    index   buffer contents
 *	  0   :  0,  1,  2,  3,
	  1   :  4,  5,  6,  7,
 *	  2   :  8,  9, 10, 11,
	  3   : 12, 13, 14, 15,
 *	  4   : 16, 17, 18, 19,
	  5   : 20, 21, 22, 23,
 *	  6   : 24, 25, 26, 27,
	  7   : 28, 29, 30, 31,
 *	  8   : 32, 33, 34, 35,
	  9   : 36, 37, 38, 39,
 *	 10   : 40, 41, 42, 43,
	 11   : 44, 45, 46, 47,
 *	 12   : 48, 49, 50, 51,
	 13   : 52, 53, 54, 55,
 *	 14   : 56, 57, 58, 59,
	 15   : 60, 61, 62, 63

The buffer contents are shown with four columns because the number of elements in the base type is vlen = 4.

Running the previous code block shows the following result.

i = 0:
  v = 0 1 2 3 
i = 1:
  v = 8 9 10 11 
i = 2:
  v = 16 17 18 19 
i = 3:
  v = 24 25 26 27 
i = 4:
  v = 32 33 34 35 
i = 5:
  v = 40 41 42 43 
i = 6:
  v = 48 49 50 51 
i = 7:
  v = 56 57 58 59 

Dimension 1

The step specified for dimension 1 (step1) is the distance from index 0. Thus, after the length specified for dimension 0 is obtained, the first component for dimension 1 is the base vector at the index defined by step1.

Figure 2. Dimension 1

Steps within a dimension are inherited from step0.

Dimension 2

The step specified for dimension 2 (step2) is the distance from index 0. Thus, after the length specified for dimension 1 is obtained, the first component for dimension 2 is the base vector at the index defined by step2.

Figure 3. Dimension 2

Steps within a dimension are inherited from step0.

Reiterating Over a Sub-Volume

If the buffer needs to be accessed as (0, 1 ,2 , 3) (4x), (4, 5, 6, 7) (4x), and so on until (60, 61 ,62 ,63) (4x), the following code fragment accomplishes this.

#include "aie_api/aie.hpp"
#include "aie_api/aie_adf.hpp"
#include "aie_api/utils.hpp"

constexpr unsigned vlen = 4;        // vector length of base type
constexpr unsigned nvec = 16;       // no. of base vectors in buffer
constexpr unsigned N = vlen * nvec; // size of data buffer

using dtype = int32;    // data type of base type and data buffer

void tbuff_subvol() {

    // declare data buffer
    alignas(aie::vector_decl_align) dtype buff[N];

    // initialize buffer contents
	for (unsigned i = 0u; i < N; i++) {
        buff[i] = i;
    }

    // tensor descriptor with base type: <dtype, vlen>
    // tensor dimensions: (tsize, tstep)
	auto desc = aie::make_tensor_descriptor<dtype, vlen>(aie::tensor_dim(4, 1), // 1st set
                                                         aie::tensor_dim(4, 0)  // repeat 4x
    );

    // create tensor buffer stream associating buff with desc
	auto tbs = aie::make_tensor_buffer_stream(buff, desc);

    // show the contents of the tensor buffer stream at each step increment
	for (unsigned i = 0u; i < N; i++) {
    	aie::vector<dtype, vlen> v = tbs.pop();
    	printf("i = %d:\n  ", i);
    	aie::print(v, true, "v = ");
	}

} // end tbuff()

Running this produces the following result.

i = 0:
  v = 0 1 2 3 
i = 1:
  v = 0 1 2 3 
i = 2:
  v = 0 1 2 3 
i = 3:
  v = 0 1 2 3 
i = 4:
  v = 4 5 6 7 
i = 5:
  v = 4 5 6 7 
i = 6:
  v = 4 5 6 7 
i = 7:
  v = 4 5 6 7 

8< --- snip --- >8

i = 56:
  v = 56 57 58 59 
i = 57:
  v = 56 57 58 59 
i = 58:
  v = 56 57 58 59 
i = 59:
  v = 56 57 58 59 
i = 60:
  v = 60 61 62 63 
i = 61:
  v = 60 61 62 63 
i = 62:
  v = 60 61 62 63 
i = 63:
  v = 60 61 62 63