Runtime Ratio - 2024.2 English

AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID
UG1079
Release Date
2024-11-28
Version
2024.2 English

The runtime ratio is a user-specified constraint that allows the AI Engine tools the flexibility to put multiple AI Engine kernels into a single AI Engine, if their total runtime ratio is less than one. The runtime ratio of a kernel can be computed using the following equation.

runtime ratio = (cycles for one run of the kernel)/(cycle budget)

The cycle budget is the number of cycles allowed to run one invocation of the kernel, which depends on the system throughput requirement.

Cycles for one run of the kernel can be estimated in the initial design stage. For example, if the kernel contains a loop that can be well pipelined, and each cycle is capable of handling that amount of data, then the cycles for one run of the kernel can be estimated by the following.

synchronization of synchronous buffers + function initialization + loop count * cycles of each iteration of the loop + preamble and postamble of the loop
Note: synchronization of synchronous buffers + function initialization takes tens of cycles, depending on the interface numbers. This needs to be taken into account when targeting for performance.

Cycles for one run of the kernel can also be profiled in the aiesimulator when vectorized code is available.

If multiple AI Engine kernels are put into a single AI Engine, they run in a sequential manner, one after the other, and they all run once with each iteration of graph::run, unless there is a multi-rate processing. This means the following.

  • If the AI Engine runtime percentage (specified by the runtime constraint) is allocated for the kernel in each iteration of graph::run (or on an average basis, depending on the system requirement), the kernel performance requirement can be met.
  • For a single iteration of graph::run, the kernel takes no more percentage than that specified by the runtime constraint. Otherwise, it might affect other kernels' performance that are located in the same AI Engine.
  • Even if multiple kernels have a summarized runtime ratio less than one, they are not necessarily put into a single AI Engine. The mapping of an AI Engine kernel into an AI Engine is also affected by hardware resources. For example, there must be enough program memory to allow the kernels to be in the same AI Engine, and also, stream interfaces must be available to allow all the kernels to be in the same AI Engine.
  • When multiple kernels are put into the same AI Engine, resources might be saved. For example, the buffers between the kernels in the same AI Engine are single buffers instead of ping-pong buffers.
  • Increasing the runtime ratio of a kernel does not necessarily mean that the performance of the kernel or the graph is increased, because the performance is also affected by the data availability to the kernel and the data throughput in and out of the graph. A pessimistically high runtime ratio setting might result in inefficient resource utilization.
  • Low runtime ratio does not necessarily limit the performance of the kernel to the specified percentage of the AI Engine. For example, the kernel can run immediately when all the data is available if there is only one kernel in the AI Engine, no matter what runtime ratio is set.
  • Kernels in different top-level graphs can not be put into the same AI Engine, because the graph API needs to control different graphs independently.
  • Set the runtime ratio as accurately as possible, because it affects not only the AI Engine to be used, but also the data communication routes between kernels. It might also affect other design flows, for example, the power estimation.