Graphics Processing Unit - Graphics Processing Unit - AM026

Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual (AM026)

Document ID
AM026
Release Date
2025-12-23
Revision
1.3 English

The Arm® Mali™ -G78AE GPU is designed as a versatile and high-performance solution tailored to meet the diverse needs of advanced graphics rendering across different segments, including automotive, robotics, industrial IoT, healthcare, machine vision, pro AV and broadcast, and aerospace and defense. Built with adaptive and scalable architecture, it supports complex and heterogeneous applications, from graphics rendering to lightweight compute-intensive tasks (e.g.,machine learning inference). Its robust multi-partitioning and virtualization capabilities allow for workload isolation, making it ideal for safety-critical environments such as autonomous driving systems and industrial automation. Refer to Figure 1 for all the functional units and subsystems in the device.

The Mali-G78AE is the latest 64-bit architecture GPU from the Arm automotive enhanced line of IP. It meets ISO 26262 and IEC 61508 standards, making it suitable for applications requiring ASIL-B/SIL 2 safety levels. This GPU is engineered to handle both graphical and computational tasks with high efficiency. Its flexible partitioning capability allows the GPU to be divided into multiple, independent virtual GPUs, facilitating the concurrent execution of diverse workloads.

This feature is particularly advantageous in automotive scenarios where multiple displays and computational tasks, such as ADAS functionality and 3D surround view systems, must operate simultaneously without interference. The Mali-G78AE features four shader cores configured in a two slice, two cores per slice arrangement, supporting one or two partitions. Additionally, it supports hardware virtualization, crucial for functional safety, with flexible core partitioning.

Supporting industry standard APIs like OpenGL ES 3.2, Vulkan 1.3, and OpenCL™ 3.0, the Mali-G78AE ensures compatibility with both legacy and next-generation applications. This GPU’s efficient architecture leverages advanced features like multi-sample anti-aliasing, adaptive scalable texture compression (ASTC), and frame buffer compression (AFBC) to deliver high-resolution graphics with reduced power consumption. With configurable hardware partitions and support for up to eight virtual machines, the Mali-G78AE is built for high-resilience, mission-critical applications across various sectors, promising optimal resource utilization and high throughput in demanding conditions.

The integration of programmable logic with the GPU on the same SoC opens up new possibilities for advanced automotive use cases. Programmable logic can offload and accelerate specific tasks, such as sensor data processing and real-time decision making algorithms, which are vital in autonomous driving.
Figure 1. Enabling a Wide Range of Markets
Table 1. Arm Mali-G78AE GPU Key Features
Feature Description
Anti-aliasing Supports 4x, 8x, and 16x MSAA (multisample anti-aliasing), providing smoother visuals by reducing aliasing artifacts in graphics rendering.
API support Fully compatible with multiple APIs for graphics and compute tasks: OpenGL ES 3.2, Vulkan 1.3, Vulkan SC 1.0, OpenCL 3.0, and OpenGL SC 2.0. Enables next-generation and legacy application support, including safety-critical environments.
Adative scalable texture compression (ASTC) Supports both low dynamic range (LDR) and high dynamic range (HDR) for 2D and 3D images, improving image quality and reducing memory bandwidth for energy efficiency.
Arm frame buffer compression (AFBC) Implements AFBC v1.3, with a 4x4 pixel block size for lossless image compression. Reduces memory bandwidth, conserving power and maintaining image quality.
Hardware partitions Configurable into up to two hardware partitions, allowing a single partition with four shader cores or two partitions with two cores each. Enables isolation of critical workloads.
Multiple virtual machines (VMs) Supports up to eight virtual machines (VMs), allowing multiple OS instances or applications to run on the same hardware while maintaining workload isolation.
Clock speed (FMAX) Achieves up to 1050 MHz on the -2MHP speed grade, with a baseline of 1000 MHz at -1HP speed grade.
Vertex rate Processes up to 2100 million vertices per second at 1050 MHz, enhancing the GPU's capacity for high-detail 3D scenes.
Pixel fill rate Can render up to 8.4 billion pixels per second at 1050 MHz, ensuring efficient, high-resolution image generation for demanding applications.
Texture rate Processes textures at 16.8 billion texels per second at 1050 MHz, enabling rapid texture mapping for realistic surface details.
FP32 operations Delivers up to 268.8 GFLOPs of 32-bit floating-point operations per second at 1050 MHz, supporting intensive compute tasks and ML inference.