The runtime ratio is a user-specified constraint that allows the AI Engine tools the flexibility to put multiple AI Engine kernels into a single AI Engine, if their total runtime ratio is less than one. The runtime ratio of a kernel can be computed using the following equation.
The cycle budget is the number of cycles allowed to run one invocation of the kernel, which depends on the system throughput requirement.
Cycles for one run of the kernel can be estimated in the initial design stage. For example, if the kernel contains a loop that can be well pipelined, and each cycle is capable of handling that amount of data, then the cycles for one run of the kernel can be estimated by the following.
Cycles for one run of the kernel can also be profiled in the
aiesimulator
when vectorized code is available.
If multiple AI Engine kernels are put into a single AI Engine, they
run in a sequential manner, one after the other, and they all run once with each
iteration of graph::run
, unless there is a multi-rate
processing.
This
means the following.
- If the AI Engine runtime percentage
(specified by the runtime constraint) is allocated for the kernel in each iteration
of
graph::run
(or on an average basis, depending on the system requirement), the kernel performance requirement can be met. - For a single iteration of
graph::run
, the kernel takes no more percentage than that specified by the runtime constraint. Otherwise, it might affect other kernels' performance that are located in the same AI Engine. - Even if multiple kernels have a summarized runtime ratio less than one, they are not necessarily put into a single AI Engine. The mapping of an AI Engine kernel into an AI Engine is also affected by hardware resources. For example, there must be enough program memory to allow the kernels to be in the same AI Engine, and also, stream interfaces must be available to allow all the kernels to be in the same AI Engine.
- When multiple kernels are put into the same AI Engine, resources might be saved. For example, the buffers between the kernels in the same AI Engine are single buffers instead of ping-pong buffers.
- Increasing the runtime ratio of a kernel does not necessarily mean that the performance of the kernel or the graph is increased, because the performance is also affected by the data availability to the kernel and the data throughput in and out of the graph. A pessimistically high runtime ratio setting might result in inefficient resource utilization.
- Low runtime ratio does not necessarily limit the performance of the kernel to the specified percentage of the AI Engine. For example, the kernel can run immediately when all the data is available if there is only one kernel in the AI Engine, no matter what runtime ratio is set.
- Kernels in different top-level graphs can not be put into the same AI Engine, because the graph API needs to control different graphs independently.
- Set the runtime ratio as accurately as possible, because it affects not only the AI Engine to be used, but also the data communication routes between kernels. It might also affect other design flows, for example, the power estimation.