APU - 2025.2 English - UG1273

Versal Adaptive SoC Design Guide (UG1273)

Document ID
UG1273
Release Date
2025-12-03
Version
2025.2 English

The application processing unit (APU) is a feature-rich eight-core, two cluster unit based on Arm Cortex®-A78AE processors. The A78AE cores can operate split or in lock-step. Each core has: 64 KB level 1 instruction/data cache and 512 KB unified level 2 cache. Each cluster of four processors has 1 MB unified level 3 cache.

The APU communicates with the rest of the processing system via the coherent hub interface (CHI) based coherency interconnect (CMN-600 w/AE). The coherent interconnect enables the processing system to satisfy safety requirements, provides snoopable LLC, enables efficient L3 stashing, and provides sufficient bandwidth and QoS for incoming traffic and traffic to the NoC.

The following table shows the difference between the Cortex-A53 in AMD Zynq™ UltraScale+™ MPSoCs, Cortex-A72, and the Cortex-A78AE processors in the processing system of the Versal devices.

Table 1. Cortex-A53, Cortex-A72, and Cortex-A78AE Comparison
Cortex-A53 Cortex-A72 Cortex-A78AE
Armv8A architecture (64-bit and 32-bit operations)
EL0-EL3 exception levels
Advanced SIMD NEON floating-point unit
Integrated memory manager
Power island control
Up to 1500 MHz Up to 1700 MHz Up to 2400 MHz
3.13 DMIPS per MHz per processor 5.74 DMIPS per MHz per processor 11.38 DMIPS per MHz per processor
1 Quad core cluster Dual core cluster 4-Dual core clusters
32 KB L1 instruction cache 48 KB L1 instruction cache 64 KB L1 instruction cache
32 KB L1 data cache 32 KB L1 data cache 64 KB L1 data cache
512 KB unified L2 cache 512 KB L2 cache per processor 512 KB L2 cache per processor
N/A N/A 1 MB L3 cache per cluster 1
N/A N/A 4 MB unified last-level cache (LLC) 2
  1. For Versal Prime Series Gen 2 (2VM3654), it is 2 MB L3 cache per cluster.
  2. For Versal Prime Series Gen 2 (2VM3654), there is no LLC.