Interface Considerations - 2025.2 English - UG1079

AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID
UG1079
Release Date
2025-11-26
Version
2025.2 English

Single-kernel programming focuses on vectorization of algorithm in a single AI Engine. Multiple-kernel programming uses multiple AI Engine kernels with data flowing between them.

The ADF graph can contain a single kernel or multiple kernels interacting with PS, PL, and global memory. Each AI Engine kernel has a runtime ratio. This number is the ratio of the number of cycles taken by one kernel invocation (processing one data block) to the cycle budget. The cycle budget for an application is typically fixed according to the expected data throughput and the block size being processed. The runtime ratio is specified as a constraint for every AI Engine kernel in the ADF graph.

The AI Engine compiler allocates multiple kernels into a single AI Engine if the following conditions are met:

  • Their combined total runtime ratio is less than one and multiple kernels fit in the AI Engine program memory, and
  • If the total resource usage, like stream interface number, does not exceed the AI Engine tile limit.

Alternatively, the compiler can allocate them into multiple AI Engines.

Important: Each AI Engine can access the following:
  • two 32-bit AXI4-Stream inputs
  • two 32-bit AXI4-Stream outputs
  • one 384-bit cascade stream input
  • one 384-bit cascade stream output
  • two 256-bit data loads
  • one 256-bit data store
However, due to the length of the instruction, not all of these operations can be performed during the same cycle.

To optimally use hardware resources, it is critical to understand the different methods available to do the following:

  • Transfer data between the ADF graph and PS, PL, and global memory
  • Transfer data between kernels
  • Balance data movement
  • Minimize memory or stream stalls

The following sections cover these methods in detail.