Determining Why an Onload Device Plugin is Failing - Determining Why an Onload Device Plugin is Failing - UG1656

Onload Operator for Kubernetes User Guide (UG1656)

Document ID
UG1656
Release Date
2024-04-26
Revision
1.0 English

For example, the pod has a status of CrashLoopBackOff.

Use the Onload Components commands to determine whether any of the following have occurred:

  • onload-device-plugin, onload-worker, or onload-user images failed to pull from registries.
  • The allocated namespace (by default, current context):
    • Has a security policy which disallows privileged pods/roles.
    • Has insufficient resource quotas.
  • Your current user context is insufficiently privileged for creating privileged resources.
  • init container failed to copy Onload files to a node(s) local storage for any of the following reasons:
    • Sufficient writable free space is not available on node filesystem mounts.
    • Kernel security (eg. SELinux) is lacking a compatible configuration for the mounts.
    • One or more of the host mount paths is incompatible with the underlying host filesystem layout.
  • Onload Worker failed to start an environment for the Onload Control Plane as:
    • /dev/onload was not found due to node host missing:
      • Onload kernel module.
      • Out-of-tree sfc kernel module.
    • Its container ID was not found or parsed due to an incompatible CRI.
  • Onload Device Plugin's AMD Solarflareâ„¢ hardware discovery failed due to incompatible or missing hardware or other kernel sysfs enumeration issue.
  • Another more general issue, such as namespace security policies preventing privileged containers.