AI Engine - 2025.1 English - UG1556

Power Design Manager User Guide (UG1556)

Document ID
UG1556
Release Date
2025-05-29
Version
2025.1 English

The Versal AI Core Series delivers breakthrough AI inference acceleration with AI Engines. This series is designed for a breadth of applications, including cloud for dynamic workloads and network for massive bandwidth, all while delivering advanced safety and security features. AI and data scientists, as well as software and hardware developers, can all take advantage of the high compute density to accelerate the performance of any application. Given the AI Engines's advanced signal processing compute capability, it is well-suited for highly optimized wireless applications such as radio, 5G, backhaul, and other high-performance DSP applications

AI Engines are an array of very-long instruction word (VLIW) processors with single instruction multiple data (SIMD) vector units that are highly optimized for compute-intensive applications, specifically digital signal processing (DSP), 5G wireless applications, and artificial intelligence (AI) technology such as machine learning (ML).

The AI Engine page in the PDM tool for Versal adaptive SoC is available for the AI Core Series family and some AI Edge Series devices and Premium Series devices. The PDM tool estimates the power consumption of AI Engine blocks for a particular configuration. The following figure shows the AI Engine Power interface.

Figure 1. AI Engine Power Interface

For an early power estimation, specify the AI Engine array clock frequency, number of cores, kernel type, and the Vector Load average percentage for the cores. The supported kernel types are Int8, Int16, and Floating Point.

The kernel type represents the datatype used in the vector processing in the kernel function. There can be scenarios where a kernel uses mixed datatypes. In this case, the recommendation is that you select the lower precision datatype that is the one that has more impact on the power estimate.

Tip: When considering the Vector Load percentage, use the average loading percentage. The kernel could be using 100% of the available core runtime, however overhead from pre-fetch, memory accesses, NOPs, stream, and lock stalls should be considered. The recommended range is 30% to 70%.

Data Memory and Interconnect Load fields are auto-populated based on the number of AI Engines used and can be overridden based on the application requirement. There are eight memory banks in an AI Engine tile (each bank is 4 KB in size totaling 32 KB per tile). By default, PDM uses all of them and this can be overridden if the application requires fewer bank accesses.

Memory R/W rate is average Read/Write memory access for each bank.

Tip: The Memory R/W rate is an average value. PDM uses 20% by default. Recommended value range is 10% to 30%.

The AI Engine array interface allows access to rest of the AMD Versalâ„¢ adaptive SoC. There are interface tiles for both the Programmable Logic (PL) and Network On Chip (NoC), and these interfaces tiles are represented as streams. You can overwrite the PL/NoC streams based on design application. The interconnect fields are read-only and calculated based on your input. PL streams show the available streams in the first row of AI Engine tiles and lets you specify the number of 64b PL streams that are used. It is recommended that PL streams are set at default 14 streams per 20 AI Engine tiles used. However, PL streams can be changed. You can see a DRC (cell turns yellow in the Utilization table) when the PL streams exceed the available streams within the total AI Engine array.

Interconnect load is averaged to a fixed value of 12% and has minimum impact to power, and can be overridden by import flow described in the next section. The maximum range for clock speed depends on the speed grade of a device with 1300 MHz for –3H grade. For more information, see Versal Adaptive SoC AI Engine Architecture Manual (AM009).