The application processing unit (APU) is a feature-rich eight-core, two cluster unit based on Arm Cortex®-A78AE processors. The A78AE cores can operate split or in lock-step. Each core has: 64 KB level 1 instruction/data cache and 512 KB unified level 2 cache. Each cluster of four processors has 1 MB unified level 3 cache.
The APU communicates with the rest of the processing system via the coherent hub interface (CHI) based coherency interconnect (CMN-600 w/AE). The coherent interconnect enables the processing system to satisfy safety requirements, provides snoopable LLC, enables efficient L3 stashing, and provides sufficient bandwidth and QoS for incoming traffic and traffic to the NoC.
The following table shows the difference between the Cortex-A53 in AMD Zynq™ UltraScale+™ MPSoCs, Cortex-A72, and the Cortex-A78AE processors in the processing system of the Versal devices.
| Cortex-A53 | Cortex-A72 | Cortex-A78AE |
|---|---|---|
| Armv8A architecture (64-bit and 32-bit operations) | ||
| EL0-EL3 exception levels | ||
| Advanced SIMD NEON floating-point unit | ||
| Integrated memory manager | ||
| Power island control | ||
| Up to 1500 MHz | Up to 1700 MHz | Up to 2400 MHz |
| 3.13 DMIPS per MHz per processor | 5.74 DMIPS per MHz per processor | 11.38 DMIPS per MHz per processor |
| 1 Quad core cluster | Dual core cluster | 4-Dual core clusters |
| 32 KB L1 instruction cache | 48 KB L1 instruction cache | 64 KB L1 instruction cache |
| 32 KB L1 data cache | 32 KB L1 data cache | 64 KB L1 data cache |
| 512 KB unified L2 cache | 512 KB L2 cache per processor | 512 KB L2 cache per processor |
| N/A | N/A | 1 MB L3 cache per cluster 1 |
| N/A | N/A | 4 MB unified last-level cache (LLC) 2 |
|
||