Making the correct decisions for partitioning your application based on the information in the previous sections enables the most efficient use of the Versal adaptive SoC and therefore, the best performance. Best performance means that all engines are being efficiently used and with particular emphasis on the AI Engine, all engines are running with a high level of utilization and delivering high performance at low power (for example, delivering high performance per watt for high compute applications).
Where AI Engine core utilization is low, consider running additional low utilization kernels on the same AI Engine tile. These kernels are not able to run simultaneously, but if the application allows it, this method can improve overall AI Engine array efficiency.
Clock gating of unused tiles is also possible within the AI Engine array and is turned on by default. For a tile to be considered unused, no components can be enabled, that is, no memory banks, interconnect (including route-thrus), or AI Engine core usage. Where necessary, it is possible to use the bounding box constructor to guide the placement of your kernels to ensure maximum clock gating can be achieved. For more information, see the AI Engine Tools and Flows User Guide (UG1076).