The processing system (PS) provides general-purpose, high-performance compute power with familiar operating environments. It includes the multi-core application processing unit (APU) subsystem and the multi-core real-time processing unit (RPU) subsystem. Linux and bare-metal software stacks can execute in the APU and RPU in a homogeneous or a heterogeneous environment.
The PMC subsystem provides device boot and management functions with its RCU BootROM code unit and the platform loader and management (PLM) firmware running in the PMCs PPU processor.
The device also includes a NoC interconnect, multiple DDR memory controllers, the PL, and integrated peripherals.
The following list describes the largest hardware view components that are available in Versal Prime Series Gen 2 and Versal AI Edge Series Gen 2 devices.
- APU
- The APU is located in the full-power domain (FPD). The APU can be used for
computations, control-plane applications, operating systems, communications
interfaces, and more.
The APU subsystem is based on up to four clusters with two CPU cores in each cluster for up to eight Arm® Cortex®-A78AE cores. Two cores within a cluster can operate in dual or lock-step mode. The APU includes the Arm® generic interrupt controller (GIC-600-AE) to manage shared and system interrupts. The APU is tightly coupled to a cache mesh network (CMN) that is surrounded by a system memory management unit (SMMU) for other transaction hosts including DMA units and other system processors. The CMN includes the system-level cache memories with ECC to form a tightly-coupled coherent system.
The application processing unit (APU) consists of either dual-core or quad-core clusters (one to four clusters per device) featuring Cortex-A78AE processor cores, L1 cache, L2 cache, L3 cache and related functionality.
For more information, refer to Application Processing Unit section in Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual (AM026).
- RPU
- The RPU provides predicable software execution times for real-time
applications.
The RPU processor is based on up to five clusters with two CPU cores in each cluster for up to 10 Arm Cortex-R52 cores. The two cores within a cluster can operate in dual or lockstep mode. Each CPU includes separate L1 instruction and data caches and TCMs that are dedicated to their RPU cores to narrow down the deterministic behavior for real-time data processing applications. The CPUs feature out-of-order execution that is coupled with a single/double precision floating point unit (FPU). The processor also includes a general interrupt controller (GIC PL-390) to receive system interrupts.
The RPU subsystem includes tightly-coupled memories (TCMs) and is placed close to the on-chip memory (OCM) for deterministic software execution rates in a standard programming environment. System memory space is cacheable, but the TCM and OCM memory spaces are non-cacheable. The RPU, TCMs, and OCM are located in the low-power domain (LPD).
For more information, refer to Real-time Processing Unit section in Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual (AM026).
- NoC Interconnect
- The NoC is an AXI-interconnecting network used for sharing data between IP
endpoints in the PL, the PSXC, and other integrated blocks (including DDR memory
controllers). The NoC interconnect is pervasive across the device to connect
APU, RPU, and other processing units to the DDR memory controllers and other
integrated blocks. This device-wide infrastructure is a high-speed, integrated
data path with dedicated switching. The NoC can be logically configured to
represent complex topologies using a series of horizontal and vertical paths and
a set of customizable architectural components.
The NoC is designed for scalability. It is composed of a series of interconnected horizontal (HNoC) and vertical (VNoC) paths, supported by a set of customizable, hardware implemented components that can be configured in different ways to meet design timing, speed, and logic utilization requirements. The HNoC and VNoC are dedicated, high-bandwidth paths connecting integrated blocks between the processor system and the PL without consuming large amounts of PL.
The NoC supports end-to-end quality of service (QoS) to effectively manage transactions and balance competing latency and bandwidth requirements of each traffic stream.
The NoC components comprise NoC master units (NMU), NoC slave units (NSU), and NoC packet switches (NPS). The NMU is the traffic ingress point and the NSU is the traffic egress point. All IPs have some number of these master and slave connections. The NPS is the crossbar switch that is used to fully form the network.
NoC functionality is described in the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313).
- DDR Memory Controller
- The device includes one or more enhanced DDR5/LPDDR5 memory controllers
(DDRMC5E) that are accessible through the NoC interconnect.
The DDR memory controller includes the option for AES-GCM or AES-XTS encryption. A built-in hardware masking feature is available when using AES-GCM encryption to provide resistance to dynamic power analysis (DPA) or side channel analysis (SCA).
For more information, refer to DDR Memory Controller section in Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual (AM026).
- Platform Management Controller
- The PMC includes a ROM code unit (RCU) processor, the platform processing unit (PPU) that runs the platform loader and manager (PLM) firmware, the boot interfaces, and the voltage/temperature system monitor (SYSMON).
- ROM Code Unit
- The RCU processor executes the BootROM code to provide hardware boot. The BootROM code is the first to run after a device-level reset; this can include a power-on reset or software reset. The BootROM code initializes the device, enables the boot interface, and processes the boot header. To finish the hardware boot process, the RCU loads the PLM firmware into the PMC microprocessor and relinquishes control of the system to the PLM. After the PLM takes system control, the RCU switches to a services mode, which includes system monitoring and service execution including in-place PLM firmware update.
- Platform Loader and Manager Firmware
- The PLM firmware performs several tasks that include device configuration and partial-reconfiguration of the programmable logic. The PLM firmware loads the Applications Security Unit (ASU) firmware to provide security functionality to the processing systems.
- Boot Interfaces
- The boot interfaces include those for flash memory controllers for autonomous boot modes and the select map and JTAG boot interface for managed boot modes.
- Voltage and Temperature System Monitor
- The voltage and temperatures in the SoC are measured by the system monitors (SYSMON). Various voltage rails are monitored to detect out of range measurements for safety and anomalies for security. It also monitors under and over temperature conditions.
- Security Features
- The PMC includes an AES accelerator with GCM modes, SHA2,
SHA3, public key cryptographic algorithms RSA with elliptic curve cryptography
(ECC), true random number generator (TRNG), a physically unclonable function
(PUF) to create a signature, and a public key infrastructure accelerator.
For more information, refer to Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual (AM026).
- Programmable Logic
- The PL is a scalable structure that provides the ability to create many
possible functions. The integrated hardware options have interconnect interfaces
and connections to the PL. The PL I/O includes both LVCMOS buffers and gigabit
transceivers that cover a wide range of applications and frequencies. For more
information, see the Programmable Logic
section.
The programmable logic supports AXI SmartConnect core functionality that can be instantiated using a library of AMD AMD LogiCORE™ IPs. The AXI SmartConnect core can be independent within the PL or extended and attached to the processing system through several AXI interfaces with and without coherency with the APU system cache.
- Coherent Mesh Network
- To minimize memory latency and throughput, processor systems use caches. To
keep caches coherent, the coherent interconnect is required. The CMN is based on
the Arm CMN-600 AE with its snoop filter
(SF) table feature. It provides tight memory coherency between the FPD System
Cache and a PL system cache using the CHI interface protocol to support multiple
heterogeneous processing environments. It is part of the FPD interconnect.
The APU subsystem is tightly coupled to a coherent interconnect with system cache to provide high-performance software compute power. The Cortex-A78AE cores and caches are part of Arm MPCore IP.
- Tightly coupled memory (TCM) in the RPU
- This memory is 512 KB and is mainly used by the RPU but can be accessed by the APU.
- On-chip memory (OCM)
- Versal Prime Series Gen 2 and Versal AI Edge Series Gen 2 have OCM size of 2 MB where each bank can be accessed through a dedicated 128-bit AXI interfaces via LPD interconnect.
- Application security unit (ASU)
- The ASU subsystem includes a RISC-V processor with AES, ECC, SHA2, SHA3,
TRNG, and RSA crypto accelerators. It also includes support for a volatile user
key vault and an interface to the PL for extending crypto functionality. AMD provides ASU FW to support these features. The main purpose
of the ASU is to accelerate crypto operations and to act as the key management
unit for runtime applications running in the APU, RPU, or processors
instantiated within the PL. The ASU firmware is loaded during the boot process
by the PLM firmware. This is done in a secure manner with authentication and/or
decryption. The ASU is located in the LPD power domain.
For more information, see Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual (AM026).
- Multimedia interface
- Simultaneous transport of USB 3.2 Gen 2x1 (up to 10 Gb/s) and DP 1.4 with
HDCP 2.3 (up to 4 Kp60 or 8 Kp30) traffic over a single USB Type-C connector USB
3.2 Gen 2x1 dual role device Display Port transmit controller DP AUX-I2C PHY.
10 Gb ethernet MAC (10 Gb) to support 1 G, 2.5 G, 5 G, and 10 G streams.
Two PCIecontrollers for Gen 5x4 support as root complex (RC) or endpoint (EP) Four shader controllers in two slices running up to 1 GHz with support for one or two partitions
- Processing System High-speed Connectivity
- USB 3.2 Gen 2x1, PCIe® , 10GbE, Display Port and Display Controller
- Graphics Processing Unit (GPU)
- The Arm® Mali™-G78AE GPU is part of the Valhall architecture family. G78AE
architecture supports a modular configuration that adapts to both
safety-critical and performance-driven applications by organizing its shader
cores into core clusters, slices, and partitions.
For more information, refer to Graphics Processing Unit section in Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual (AM026).
- Integrated Hardware Options
-
- AI Engine-ML v2
- Video codec unit (VCU)
- H.264 and H.265 encoding, decoding
- JPEG decode
- Two unit, multiple streams each
- Image signal processor (ISP)
-
- Contains three tiles, each contains two ISPs
- Dual output
- Single and multi-stream
- Resolutions up to 4Kp60
- Power Domains
- The device includes several power domains. The LPD is required for running the PMC to boot, manage the device, and access the PMC I/O peripherals and flash memory controllers. The LPD also provides power to the RPU. The FPD is required for the APU, the system caches, and the memory coherency subsystem. The SoC power domain (SPD) is for the NoC interconnect and DDR memory controllers. The PL power domain is for logic instantiated in the PL and also the integrated logic and peripherals.