Installing K8s Device Plugin on Kubernetes - 2023.1 English

AMD-Xilinx Kubernetes Device Plugin

Release Date
2023-06-10
Version
2023.1 English

The following steps require kubectl to be connected to your cluster. After the Xilinx device plugin for Kubernetes is installed, no additional configuration is needed when adding nodes to the cluster.

  1. Remove the current device plugin daemonset:

    kubectl delete daemonset current-device-plugin-daemonset-name -n kube-system
    
  2. Create a file called k8s-device-plugin.yml and paste the following content in it: In the above yaml file, the following 2 environmental variables define naming convention and access granularity:

    apiVersion: apps/v1
    #if run with k8s v1.16-, replace the above line with
    #apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
      name: device-plugin-daemonset
      namespace: kube-system
    spec:
    #if run with k8s v1.16-, the following 3 lines are not required
      selector:
        matchLabels:
        name: device-plugin
      template:
        metadata:
          labels:
            name: device-plugin
        spec:
          tolerations:
          priorityClassName: "system-node-critical"
          containers:
          - image: public.ecr.aws/xilinx_dcg/k8s-device-plugin:1.2.0
            name: device-plugin
            env:
            - name: U30NameConvention
              value: CommonName
            - name: U30AllocUnit
              value: Card
            securityContext:
              allowPrivilegeEscalation: false
              capabilities:
                drop: ["ALL"]
            volumeMounts:
              - name: device-plugin
                mountPath: /var/lib/kubelet/device-plugins
          volumes:
            - name: device-plugin
              hostPath:
                path: /var/lib/kubelet/device-plugins
    
    U30NameConvention
    • Defines how the resource name in the pod-description yaml file should be interpreted by the plugin. Allowed values are ExactName and CommonName.
    • If set to CommonName, the resource string used in the pod-description file must be set to amd.com/ama_u30. This allows setting resource limits without having to specify an exact U30 firmware version number. Using CommonName provides forward and backward compatibility of pod-description files with respect to all releases of the Xilinx Video SDK. This is the recommended setting.
    • If set to ExactName, the resource string used in the pod-description file must be set to amd.com/xilinx_u30_gen3x4_base_2-0. This allows setting resource limits for cards flashed with this specific version of the U30 firmware. When ExactName is used, there is no guarantee of forward or backward compatibility of pod-description files with respect to future releases of the Xilinx Video SDK.
    • If this variable is not specified or is set incorrectly, it will default to CommonName.
    U30AllocUnit
    • Defines the unit for the resource numeration set in the pod-description yaml file. Allowed values are Card and Device.
    • If set to Card, the resource is measured in number of cards.
    • If set to Device, the resource is measured in number of devices.
    • If this variable is not specified or is set incorrectly, it will default to Card.
  1. Deploy the Xilinx device plugin as a daemonset:

    # Apply the Xilinx device plugin
    kubectl apply -f ./k8s-device-plugin.yml
    
    # Check the status of daemonset:
    kubectl get daemonset -n kube-system
    
    # Check the status of device-plugin pod:
    kubectl get pod -n kube-system
    
  2. List visible nodes and check Xilinx resources available:

    # Get node names
    kubectl get node
    
    # Check Xilinx resources available in specific worker node
    kubectl describe node <node-name>
    
    For each node, you will see a similar report:
    ::
    
        Name:               ip-192-168-58-12.ec2.internal
        Roles:              <none>
        ......
        Capacity:
        amd.com/ama_u30:                             4
        attachable-volumes-aws-ebs:                  39
        cpu:                                         24
        ephemeral-storage:                           104845292Ki
        hugepages-1Gi:                               0
        hugepages-2Mi:                               0
        memory:                                      47284568Ki
        pods:                                        1
        Allocatable:
        amd.com/ama_u30:                             4
        attachable-volumes-aws-ebs:                  39
        cpu:                                         23870m
        ephemeral-storage:                           95551679124
        hugepages-1Gi:                               0
        hugepages-2Mi:                               0
        memory:                                      46752088Ki
        pods:                                        1