Prerequisites

  • A working Kubernetes cluster with the Cluster API installed
  • A Cluster API working control plane
  • Helm installed

Create a Working MachineDeployment

Create an OpenstackMachineTemplate

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
metadata:
  name: capi-worker-example 
spec:
  template:
    spec:
      sshKeyName: capi # Specify the keypair name
      flavor: power-c8m16384-gpu-A4500-1 # Specify the flavor of the instance, this is the flavor with a GPU attached to it
      image:
        filter:
          name: ubuntu-2204-1-29 # Depending on what you specified from the image-builder repository
      ports:
         - disablePortSecurity: true
      rootVolume:
        sizeGiB: 100 # Specify the size of the root volume
        type: silver # Specify the volume type, valid options are: wood, silver, gold.
        availabilityZone:
          name: nova
      tags:
        - capi
        - kubernetes
        - gpu

From the example above, you can see that we are creating a new OpenStackMachineTemplate called capi-worker-example. This template will be used to create the worker nodes in the cluster. Notice the flavor is different from the control plane because we want a worker node with a GPU attached to it. 🎮

Create a KubeadmConfigTemplate

apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: capi-worker-example
spec:
  template:
    spec:
      initConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            cloud-provider: external
            provider-id: openstack:///'{{ instance_id }}'
            # register-with-taints: node-role.kubernetes.io/worker=true:NoSchedule # Uncomment this line if you want to taint the worker nodes
            # node-labels: node-role.kubernetes.io/worker=true # Uncomment this line if you want to label the worker nodes
          name: '{{ local_hostname }}'
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            cloud-provider: external
            provider-id: openstack:///'{{ instance_id }}'
            # register-with-taints: node-role.kubernetes.io/worker=true:NoSchedule # Uncomment this line if you want to taint the worker nodes
            # node-labels: node-role.kubernetes.io/worker=true # Uncomment this line if you want to label the worker nodes
          name: '{{ local_hostname }}'

From the example above, you can see that we are creating a new KubeadmConfigTemplate called capi-worker-example. This template will be used to configure the worker nodes in the cluster. Notice the cloud-provider is set to external and the provider-id is set to openstack:///. This is because we are using the OpenStack cloud provider to manage the worker nodes. ☁️

Create a MachineDeployment

apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: capi-worker-example
spec:
  clusterName: capi-example
  replicas: 3
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: capi-worker-example
      clusterName: capi-example
      failureDomain: nova
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: OpenStackMachineTemplate
        name: capi-worker-example
      version: v1.29.6

From the example above, you can see that we are creating a new MachineDeployment called capi-worker-example. This deployment will be used to create the worker nodes in the cluster. Notice the replicas is set to 3. This is because we want to create 3 worker nodes in the cluster. 🎉

Apply the Resources 📜

Finally, apply the resources by running the following command:

kubectl apply -f <file_name>.yaml
openstackmachinetemplate.infrastructure.cluster.x-k8s.io/capi-worker-example created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/capi-worker-example created
machinedeployment.cluster.x-k8s.io/capi-worker-example created

Now you can watch the worker nodes being created by running the following command:

kubectl get machine -w
NAME                              CLUSTER        NODENAME           PROVIDERID                                          PHASE          AGE    VERSION
capi-worker-example-fggqs-4dhgp   capi-example                                                                            Provisioning   47s    v1.29.6
capi-worker-example-fggqs-cgn8q   capi-example                                                                            Provisioning   47s    v1.29.6
capi-worker-example-fggqs-xvkb9   capi-example                                                                            Provisioning   47s    v1.29.6

It will take some time for the worker nodes to be created. Once they are created, you can check the status of the worker nodes by running the following command (don’t forget to export the KUBECONFIG variable):

kubectl get nodes
NAME                              STATUS   ROLES           AGE     VERSION
capi-example-fwvqv                Ready    control-plane   127m    v1.29.6
capi-example-t7vxk                Ready    control-plane   101m    v1.29.6
capi-example-wn4xg                Ready    control-plane   97m     v1.29.6
capi-worker-example-qk2kr-4bp8p   Ready    <none>          4m16s   v1.29.6
capi-worker-example-qk2kr-fp2hc   Ready    <none>          11s     v1.29.6
capi-worker-example-qk2kr-hkzft   Ready    <none>          2m16s   v1.29.6

It should show the worker nodes in the cluster 🚀.

Install NVIDIA GPU Operator

Add the NVIDIA GPU Operator Helm Repository

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia

Install the NVIDIA GPU Operator

helm upgrade --wait --install -n gpu-operator --create-namespace gpu-operator \
    nvidia/gpu-operator

Now we can try to deploy a GPU workload to the cluster 🎉

Deploy a GPU Workload

Create a GPU Workload

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-vectoradd
    image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
    resources:
      limits:
        nvidia.com/gpu: 1

Now you can deploy the GPU workload by running the following command:

kubectl apply -f <file_name>.yaml
deployment.apps/gpu-workload created

And check the logs of the pod to ensure driver is loaded and the GPU is available:

kubectl logs -f gpu-workload

You should see the output of the GPU workload in the logs 🍾.

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done