Revision History - 2025.2 English - UG1603

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2025-11-26
Version
2025.2 English

The following table shows the revision history for this document.

Section Revision Summary
11/26/2025 Version 2025.2
Floating-Point Accuracy Edited for clarity.
Floating-Point Operations Updated section.
Data Types Updates figures and table.
Multi-dimensional Addressing in AI Engine Kernels Rewrite.
Read Only Memory Tile New section.
DMA FIFO New section.
Packet Switching Graph Constructs Updated for MLv2.
Kernel Location Constraints Added physical port name information.
Shared Graph-Scoped Tables Updated code block.
Read Only Memory Tile
  • Added note.
  • Changed title.
Tiling Parameters and Buffer Descriptors Added description for repetition tiling parameter.
05/29/2025 Version 2025.1
Vector Registers Added Mask and MX registers.
Rounding and Saturation Modes Added note for Rounding mode and MX data types.
Load and Store Using Buffer Streams Added a note for input/output buffer size.
Floating-Point Operations Added default value for Precision.
Floating-Point Accuracy Updated to provide details on compute precision.
Loop Flattening and Unrolling Added 'chess_unroll_loop_assuming_multiple(N)' usage.
Vectorized Matrix Multiplication Updated results for 2025.1 release.
Re-arranging Data in AI Engine-ML Memory Tile Added tutorial information.
Creating a Data Flow Graph (Including Kernels) Cleaned code.
AI Engine-ML External Memory Access Added information about configuring external memory as ping pong buffers.
Asynchronous Buffer Port Access Clarified asynchronous buffer operation.
Buffer Port Data Types
  • Added precision for mx9 elements.
  • Added new datatypes
  • Added 64-bit support
Stream and Cascade Data Types Added acc64, 128, and cacc64, 64 support.
Reading and Advancing an Input Stream Added int64 and uint64 examples.
Writing and Advancing an Output Stream Added a note for AIE-ML v2.
Area Location Constraints Clarified bounding box constraint use.
Graph Programming Comparison between AI Engine, AI Engine-ML, and AI Engine-ML v2 Clarified support for DMA FIFOs.
Mapper/Router Methodology Clarified support for DMA FIFOs.
Mapping Constraints Removed DMA FIFO note.
FIFO Constraint Removed DMA FIFO note.
11/28/2024 Version 2024.2
Data Types Updated supported data types.
Load and Store From Buffer Interface Added note to clarify that buffer streams can be used in place of iterators, if iterator usage is not suitable due to semantic or performance issues.
Load and Store Using Buffer Streams Added topic to explain how to load and store operations using buffer streams.
Vector Arithmetic Operations Added note with general instruction to increase the stack size to avoid stack overflow.
Bitwise Operations Added note to clarify that bitwise operators are only supported for integer data types
AI Engine-ML External Memory Access Added description and examples of dma_channel constraint which is used to constrain the location of external buffers.

Updated the example code using XRT API in the host code for Linux.

Synchronous Buffer Port Access Clarified the behavior of locks on a synchronous output buffer, during multiple iterations of a kernel.
Circular Output Buffer Clarified buffer addressing limitations when multiple kernels are mapped to the same tile.
Explicit Packet Switching Clarified that packet stream can be created from one or more AI Engine kernels to one or more multiple destination AI Engine kernels.
Packet Switching Graph Constructs Clarified that data read data from a packet stream is read as an integer value by default, but that can then can be cast to a different data type if desired.
Example of Packet Switching between PL and AI Engine, Example of Packet Switching from an AI Engine to Multiple AI Engines, and Example on Packet Switching from Multiple AI Engines to an AI Engine Added examples of how to set up packet switching connections between PL and AI Engine.
Logical I/O Ports Added topic that explains the usage of the graph port objects.
06/06/2024 Version 2024.1
General Changed output_stream to output_cascade, and changed input_stream to input_cascade.
Introduction to Scalar and Vector Programming Changed int32 to int16, and changed vector<8> to vector<16>.
Floating-Point Accuracy Added new section.
Casting and Datatype Conversion Added details regarding the API support of vector conversions between data types.
Stream and Cascade Data Types Updated the supported stream and data types.
Stream Switch FIFO Added the --swfifo-threshold option to increase the default maximum FIFO depth.
AI Engine-ML External Memory Access Added note to clarify that external buffers are not supported by the XRT API.
Tiling Parameters and Buffer Descriptors Removed repetition and phase from the structure tiling_parameters.
Examples on Memory and DMA Programming Added new section with examples.
Graph Programming Comparison between AI Engine, AI Engine-ML, and AI Engine-ML v2 New section to compare the graph programming constructs supported in different devices.
Graph Objects for Packet Processing Added pktorderedmerge.
Event API Moved section to UG1076.
12/07/2023 Version 2023.2
General Updated to reflect the usage of the new unified AMD Vitis™ IDE.
Inline Keywords Added inline keyword explanation.
Multi-dimensional Addressing in AI Engine Kernels Introduced aie::make_tensor_buffer_stream which supports multi-dimensional addressing inside a kernel.
Sparse Matrix Multiplication Introduced APIs that support sparse matrix multiplication.
Linear Approximation Updated the steps for a floating point based approximation information, and updated example.
Tiling Parameters and Buffer Descriptors Added important note on address alignment and its impact on data access.
Tiling Parameters Examples Updated the example in 2D Matrix Transferring to 4x2 Sub Matrices.
10/18/2023 Version 2023.2
Initial release N/A