A Contrived Task to Illustrate How to Access AIE Kernel I/O Ports - 2025.1 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2025-08-25
Version
2025.1 English

The contrived task shown in Fig. 11 shows how to access the input and output ports available to an AIE tile within a kernel program.

Fig. 11: A contrived task to illustrate the different I/O ports on the AI engine Fig. 11: A Contrived Task to Illustrate the Different I/O Ports on the AI Engine

The left side of the figure shows the mathematical description. Bold uppercase variables denote matrices, with the subscripts denoting the matrix sizes. Bold lowercase variables denote vectors, with the subscripts denoting the vector sizes. Italicized variables denote scalars.

Calculation steps:

  1. Calculate the squared magnitude of the input complex vector x

  2. Calculate the products of the 4x4 input matrices AC, DF, BC, and EF

  3. Concatenate the resulting matrix products into an 8x8 matrix

  4. Calculate the vector u as the product of the 8x8 matrix and the squared magnitude of x

  5. If the input scalar is zero, calculate the output vector as the sum of u and the input vector y; otherwise, the output vector is the difference u - y

The block diagram on the right shows the required calculations more clearly. Note that it also shows the dependencies between calculations, which as a bonus, also shows which calculations can be done in parallel.

In this contrived task, the input matrices are provided through buffers, and the input vectors through streams. The resultant vector u is handled as an accumulator cascade, and the scalar w as an input RTP.

Note that two simulation modes are available when developing AIE kernels:

  • Functional: Source code is compiled to run on the x86 host development platform. This allows fast simulations to check the veracity of the code.

  • Emulation: Source code is compiled to run on the AI engine. It is slower than functional simulation but provides cycle approximate information to estimate throughput and latency when using real hardware.