The Mali-G78AE GPU includes a partition manager in hardware and extra system components required to support its operation. The partition manager adds hardware support for access windows and hardware separation.
Access Windows
An access window provides a Virtual Machine (VM) with access to a GPU. The access includes I/O registers, IRQ lines, and StreamIDs for the System Memory Management Unit (SMMU) or the Memory Protection Unit (MPU). The arbiter controls multiple access windows and chooses which virtual machine, through its access window, has access to the GPU at any given time. The access window also provides a communication channel with the arbiter, which is always connected, so that the virtual machine can request access to the GPU. The access window removes the requirement for the hypervisor to reroute GPU interrupts, remap GPU registers and reconfigure the SMMU or MPU when switching between VMs. Switching the current access window is enough. The access window also eliminates the need for the hypervisor to support communication between the VM and the arbiter. The figure shows the use of access windows in the GPU.
Partition Manager
The partition manager makes it possible to separate a GPU into independent partitions, to effectively create multiple independent GPUs. A Mali GPU with a partition manager groups shader cores into slices and groups slices into partitions. Assuming there are enough slices in the hardware, one or more slices can be assigned to a partition. Once configured, the partition can be treated like a conventional GPU. Full hardware separation exists between the partitions, which means that different Virtual Machines can use the partitions independently. You can still use multiple access windows to share individual partitions cooperatively between multiple VMs. Partitions and slices can be grouped into resource groups, which can in turn can be assigned to different bus interfaces.
Reference Software Stacks for Virtualization
Four main functionalities are required to enable the virtualization:
Arbiter: a Software module to coordinate access to the GPU hardware. The arbiter maintains a list of all the GPU kernel driver instances. The arbiter grants the GPU for specific time interval to each GPU kernel driver instance and handles GPU kernel driver instances that do not release the GPU within a given time. The reference implementation of an arbiter is a kernel module, called mali_arbiter. This module does not require direct access to the GPU hardware. Instead, this module is provided with interfaces to delegate hardware and platform interactions to other modules. The interactions include communicating with GPU kernel driver instances or switching the GPU to different GPU kernel driver instances.
Cooperative driver: Is a Standard GPU kernel driver with an interface to communicate and coordinate GPU access with the arbiter. GPU driver cooperation is only required on the kernel side, so the user side GPU driver does not need any modifications. The GPU kernel driver must request access to the GPU and wait for the arbiter to grant the access. The kernel driver must also release this access when required, so the arbiter can grant it to another GPU kernel driver instance. The reference implementation of the GPU kernel driver is a kernel module called mali_kbase.
Communication: System to allow two-way communication between the arbiter and the GPU kernel driver instances. For cooperative virtualization, a communication channel is required between the arbiter and the GPU kernel driver instances inside the Virtual Machines (VMs). This communication is through well-defined interfaces. For the reference implementation, the arbiter and supporting functions are implemented as kernel modules. The interface is added to the user-data slot of each driver as drvdata. drvdata is the Linux kernel construct that holds the interfaces and is accessed through dev_get_drvdata(). Modules can only communicate with each other if they are in the same VM. Any module can call these structures by fetching the drvdata for the required device. If you require communication between modules in different VMs, then you must implement a suitable alternative to these function pointers.
For the reference implementation, the GPU kernel driver instance uses an interface to talk to the arbiter module called arbiter_if_vm_arb_ops. The primary functions provided by this interface are:
- Register VM with the arbiter, and provide the arbiter with an interface it can use to talk back to this GPU kernel driver instance.
- Request ownership of the GPU
- Release ownership of the GPU
- Grant ownership of the GPU to the GPU kernel driver instance.
- Request that the GPU kernel driver instance release ownership of the GPU.
Connection of a GPU driver instance to the arbiter is done through intermediate kernel drivers. This connection is defined in the gpu node of the kernel Device tree. The phandle arbiter_if links the GPU kernel driver instance to a communication driver that implements the arbiter_if_vm_arb_ops interface. For example, the figure shows how the interfaces can be called in a simple arrangement. In this arrangement, there is only one GPU kernel driver instance, and it is in the same VM as the arbiter.
Power management: Is a System to control GPU power management that accounts for the compute load of all the GPU kernel driver instances. In a virtualized environment, each GPU kernel driver instance only knows about its own activity. The driver has no knowledge of the power requirements of the other GPU kernel driver instances. For this reason, the power management features in the GPU kernel driver are disabled. Instead, this responsibility is delegated to the arbiter. The arbiter in turn delegates power requirements to another module called mali_gpu_power. This module feeds the total activity of all the GPU kernel driver instances to the power management interface of the platform.
Depending on the usage of the GPU, DVFS can change the frequency of the unit to meet computational demands. In the virtualized system, the power module computes the time that the GPU kernel driver instances connected to each arbiter take using the GPU. Each arbiter tracks how long all or part of the GPU has been scheduled to a GPU kernel driver instance. The power module requests this utilization data from each arbiter when the kernel requests it. The power module sets the GPU frequency accordingly. Specific use case requirements can help optimize the scaling calculation.
Hardware Separation
The Mali™-G78AE has a partition manager hardware that permits the GPU to be partitioned into several independent GPUs. Each of these GPUs has its own tiler and shader cores. The number of the GPU shader cores allocated to each of these partitions is dynamic. You can change the number of cores allocated to a partition at runtime with arbiter reconfiguration. The advantages of this approach are:
- Safety critical VMs can have completely isolated and fixed GPU resources.
- More cost effective and power efficient than multiple GPUs.
- GPU cores can be dynamically allocated to VMs based on usage, by switching the access window to a larger partition, or resizing its current partition.
To implement hardware separation, a GPU with a partition manager organizes its resources in the following ways:
Slice: Shader cores are grouped into hardware units. The units are called slices. Each slice has its own tiler. Slices have a fixed number of shader cores, and the hardware implementation sets this number. The atomic unit for allocating GPU resource is the slice.
Partition: A partition can be considered as an independent GPU. One or more slices can be dynamically grouped into partitions. When multiple slices are assigned to a single partition, only the tiler from the first slice is used. The tilers belonging to the other slices in the partition are disabled.
Resource Assignment with the Partition Manager
The GPU uses the partition manager to allocate graphics processing resources in groups for the physical partitions. In a typical system, a hardware manager, running on the safety island, specifies which resources each group can use. The steps to assign the GPU resources through the assignment registers are:
- Assign each group to a bus according to which application processor accesses it (PTM_RESOURCE_GROUP_BUS).
- Assign each partition to its group (PTM_PARTITION_RESOURCE_GROUP).
- Assign each slice to its group using PTM_SLICE_RESOURCE_GROUP. An arbiter can assign any slice within its group to any partition within the same group.
- Assign each access window to its group with PTM_AW_RESOURCE_GROUP. An arbiter can assign any access window within its group to any partition within the same group.
- Assign each access window a protected and a not-protected StreamID. The StreamID tags each DRAM transaction (PTM_AW0_STREAM_ID and PTM_AW0_PROTECTED_STREAM_ID).
The steps to set or change the active slices in a partition use the corresponding PTM_RESOURCE_GROUP and PTM_PARTITION_CONFIG registers:
- Disable clocks to the GPU processing slices being configured. The slices are reconfigured through PTM_SLICE_CLOCK_SET and PTM_SLICE_CLOCK_STATE.
- Assert reset to the GPU processing slices. The slices are reconfigured through PTM_SLICE_RESET_SET and PTM_SLICE_RESET_STATE.
- Update PTM_SLICE_MODE_NEW
- Poll PTM_SLICE_MODE_ACK to confirm the change has completed
- Deassert reset to GPU slices. The slices are reconfigured through PTM_SLICE_RESET_SET and PTM_SLICE_RESET_STATE
- Enable clocks to the GPU processing slices being configured.
The slices are reconfigured through PTM_SLICE_CLOCK_SET and PTM_SLICE_CLOCK_STATE. The slices not only dedicated to graphics are assigned to the same group as the partition can be enabled for that partition. Read-only registers in each group region show which slices, partitions, and windows can be used. If the number of slices in a partition changes, the GPU must inform the driver. The driver must then adjust memory allocations that depend on the number of shader cores.
For example, the driver must change thread-local storage. Alternatively the driver can allocate resources assuming each partition has the maximum configuration of shader cores. When a graphics processing slice is configured as a secondary slice it exchanges data with the preceding slice. When configured as a primary slice it ignores the preceding slice. If an arbiter incorrectly configures a primary slice to be a secondary slice, the slice might interfere with the neighboring partition. Therefore the PTM_ASSIGN.PTM_SLICE_ISOLATION register, controlled by the safety island, enforces isolation between partitions in different groups. Each instance of the GPU driver can use a private address window through which it accesses a partition. It is possible for separate driver instances to share a window, though not at the same time, when system software provides support. The PTM_AW_SET register, in the partition control group, determines which window is active for a partition. This register determines to which partition a transaction is sent, for a transaction received through that window. It is illegal to enable the same window in more than one partition or to enable more than one window for a given partition at the same time. If this condition is not met, the unit reports an error reported through PTM_PARTITION_STATE.
The steps to change which window has access to a partition use the corresponding PTM_PARTITION_CONTROL registers:
- When the partition is in use, software must request the owning driver instance to release it (yield process).
- Wait for the driver owning the instance to yield the partition through PTM_RESET_SET.
- Wait for PTM_RESET_STATE to indicate that the reset has been applied.
- Enable an access window by setting PTM_AW_SET.
- Grant the driver requesting instance access to the GPU (software handshake).
GPU Data Flow
API calls, issued by an application using the GPU, such as GLES, Vulkan, or OpenCL, are converted into one or more GPU jobs. There are different types of job, such as jobs for compute shaders, vertex shaders, fragment shaders, cache flushes, and others. A job is converted to a collection of smaller tasks which are executed in the GPU. These tasks in turn are broken down into threads. A thread is an atomic unit of work executed by the shader core, and it corresponds to a single running instance of a computer shader, for example a vertex shader. The following figure shows the simplified data-flow A job is completed when all corresponding tasks are completed. The Job Manager (JM), a hardware component on GPU, is responsible for scheduling jobs to run on the GPU.
Mali GPU uses a tile-based rendering, designed to minimize the amount of power hungry external memory accesses which are needed during rendering. Instead of rendering objects immediately, where frame buffer broken down into rectangular tiles which are then processed one at a time. The GPUs break up the screen into small 16x16 pixel tiles and construct a list of which rendering primitives are present in each tile. When the GPU fragment shading step runs, each shader core processes one 16x16 pixel tile at a time, rendering it to completion before starting the next one. By only rendering the area within the current tile, this area can be processed in fast on-GPU memory. The tile then gets written out to main memory only when rendering is finished, so memory only gets touched once
Partitioning and Virtualization
Outlined is the virtualization and partitioning capabilities of the Arm Mali-G78AE GPU, emphasizing its ability to support complex, isolated workloads across multiple virtual machines (VMs) for advanced systems like automotive applications. The two diagrams shown below and in the next section showcase the Mali-G78AE's capability to manage workloads efficiently using virtualization and partitioning: The first illustrates a practical scenario with three VMs, each running independent rendering applications, while the second demonstrates the GPU's support for a maximum of eight VMs supported on this device.
The Mali-G78AE GPU is designed with advanced workload management features that allow for efficient resource allocation and strong application isolation. By using partitioning and virtualization, this GPU enables multiple independent VMs to share the same hardware without interference, achieving high levels of flexibility and stability. This architecture is particularly beneficial in systems requiring both safety-critical and non-safety-critical applications, such as in automotive dashboards.
The example configuration in both diagrams involves multiple VMs, each hosting rendering applications with dedicated Vulkan/OpenGL ES graphics stacks and Mali graphics drivers:
- Three-VM Configuration: In the diagram, three VMs (VM1, VM2, and VM3) are shown. Each VM has its own rendering application and graphics drivers, managed by a hypervisor that controls execution and resource allocation.
In these configurations:
- Partition 0 is allocated to VM1 and VM3, utilizing Core 0 and Core 1.
- Partition 1 is dedicated to VM2, using Core 2 and Core 3.
Multiple Virtual Machines
The Mali-G78AE GPU’s design allows it to support up to eight VMs, making it suitable for high-complexity systems. In automotive use cases, this architecture supports mixed-criticality workloads, such as:
- Safety-critical displays: Managed in dedicated, isolated VMs, ensuring that any non-safety application or potential fault does not compromise the safety-critical functions.
- Non-safety-critical displays: Handled in separate VMs, enabling infotainment and other non-critical tasks without affecting safety-critical operations.
- Eight-VM Maximum Capacity: The diagram demonstrates the GPU's capability to support up to eight VMs. This flexibility allows for various configurations depending on system requirements, with the ability to run multiple non-safety and safety-critical workloads simultaneously.
Benefits of Partitioning and Virtualization on Mali-G78AE
Partitioning and virtualization provide several key advantages in complex embedded systems:
- Isolation and Stability: Partitioning allows applications in different VMs to operate independently, reducing the risk of interference and resource contention. This is essential for applications that require high stability, such as automotive displays.
- Flexible Resource Allocation: The hypervisor enables dynamic resource management, allowing the system to adjust GPU resources as needed across partitions.
- Scalability: With support for up to eight VMs, the Mali-G78AE GPU can adapt to diverse application requirements, from single-use cases to multiple concurrent tasks in mixed-criticality environments.