The following steps require kubectl to be connected to your cluster. After the Xilinx device plugin for Kubernetes is installed, no additional configuration is needed when adding nodes to the cluster.
Remove the current device plugin daemonset:
kubectl delete daemonset current-device-plugin-daemonset-name -n kube-system
Create a file called k8s-device-plugin.yml and paste the following content in it: In the above yaml file, the following 2 environmental variables define naming convention and access granularity:
apiVersion: apps/v1 #if run with k8s v1.16-, replace the above line with #apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: device-plugin-daemonset namespace: kube-system spec: #if run with k8s v1.16-, the following 3 lines are not required selector: matchLabels: name: device-plugin template: metadata: labels: name: device-plugin spec: tolerations: priorityClassName: "system-node-critical" containers: - image: public.ecr.aws/xilinx_dcg/k8s-device-plugin:1.2.0 name: device-plugin env: - name: U30NameConvention value: CommonName - name: U30AllocUnit value: Card securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] volumeMounts: - name: device-plugin mountPath: /var/lib/kubelet/device-plugins volumes: - name: device-plugin hostPath: path: /var/lib/kubelet/device-plugins
- U30NameConvention
- Defines how the resource name in the pod-description yaml file should be interpreted by the plugin. Allowed values are ExactName and CommonName.
- If set to CommonName, the resource string used in the pod-description file must be set to amd.com/ama_u30. This allows setting resource limits without having to specify an exact U30 firmware version number. Using CommonName provides forward and backward compatibility of pod-description files with respect to all releases of the Xilinx Video SDK. This is the recommended setting.
- If set to ExactName, the resource string used in the pod-description file must be set to amd.com/xilinx_u30_gen3x4_base_2-0. This allows setting resource limits for cards flashed with this specific version of the U30 firmware. When ExactName is used, there is no guarantee of forward or backward compatibility of pod-description files with respect to future releases of the Xilinx Video SDK.
- If this variable is not specified or is set incorrectly, it will default to CommonName.
- U30AllocUnit
- Defines the unit for the resource numeration set in the pod-description yaml file. Allowed values are Card and Device.
- If set to Card, the resource is measured in number of cards.
- If set to Device, the resource is measured in number of devices.
- If this variable is not specified or is set incorrectly, it will default to Card.
Deploy the Xilinx device plugin as a daemonset:
# Apply the Xilinx device plugin kubectl apply -f ./k8s-device-plugin.yml # Check the status of daemonset: kubectl get daemonset -n kube-system # Check the status of device-plugin pod: kubectl get pod -n kube-system
List visible nodes and check Xilinx resources available:
# Get node names kubectl get node # Check Xilinx resources available in specific worker node kubectl describe node <node-name> For each node, you will see a similar report: :: Name: ip-192-168-58-12.ec2.internal Roles: <none> ...... Capacity: amd.com/ama_u30: 4 attachable-volumes-aws-ebs: 39 cpu: 24 ephemeral-storage: 104845292Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 47284568Ki pods: 1 Allocatable: amd.com/ama_u30: 4 attachable-volumes-aws-ebs: 39 cpu: 23870m ephemeral-storage: 95551679124 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 46752088Ki pods: 1