Create worker pool with Cluster API
Prerequisites
- A working Kubernetes cluster with the Cluster API installed
- A Cluster API working control plane
- Helm installed
Create a Working MachineDeployment
Create an OpenstackMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
metadata:
name: capi-worker-example
spec:
template:
spec:
sshKeyName: capi # Specify the keypair name
flavor: power-c8m16384-gpu-A4500-1 # Specify the flavor of the instance, this is the flavor with a GPU attached to it
image:
filter:
name: ubuntu-2204-1-29 # Depending on what you specified from the image-builder repository
ports:
- disablePortSecurity: true
rootVolume:
sizeGiB: 100 # Specify the size of the root volume
type: silver # Specify the volume type, valid options are: wood, silver, gold.
availabilityZone:
name: nova
tags:
- capi
- kubernetes
- gpuFrom the example above, you can see that we are creating a new OpenStackMachineTemplate called capi-worker-example. This template will be used to create the worker nodes in the cluster. Notice the flavor is different from the control plane because we want a worker node with a GPU attached to it. 🎮
Create a KubeadmConfigTemplate
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
name: capi-worker-example
spec:
template:
spec:
initConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external
provider-id: openstack:///'{{ instance_id }}'
# register-with-taints: node-role.kubernetes.io/worker=true:NoSchedule # Uncomment this line if you want to taint the worker nodes
# node-labels: node-role.kubernetes.io/worker=true # Uncomment this line if you want to label the worker nodes
name: '{{ local_hostname }}'
joinConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external
provider-id: openstack:///'{{ instance_id }}'
# register-with-taints: node-role.kubernetes.io/worker=true:NoSchedule # Uncomment this line if you want to taint the worker nodes
# node-labels: node-role.kubernetes.io/worker=true # Uncomment this line if you want to label the worker nodes
name: '{{ local_hostname }}'From the example above, you can see that we are creating a new KubeadmConfigTemplate called capi-worker-example. This template will be used to configure the worker nodes in the cluster. Notice the cloud-provider is set to external and the provider-id is set to openstack:///. This is because we are using the OpenStack cloud provider to manage the worker nodes. ☁️
Create a MachineDeployment
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: capi-worker-example
spec:
clusterName: capi-example
replicas: 3
selector:
matchLabels: null
template:
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: capi-worker-example
clusterName: capi-example
failureDomain: nova
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
name: capi-worker-example
version: v1.29.6From the example above, you can see that we are creating a new MachineDeployment called capi-worker-example. This deployment will be used to create the worker nodes in the cluster. Notice the replicas is set to 3. This is because we want to create 3 worker nodes in the cluster. 🎉
Apply the Resources 📜
Finally, apply the resources by running the following command:
kubectl apply -f <file_name>.yaml
openstackmachinetemplate.infrastructure.cluster.x-k8s.io/capi-worker-example created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/capi-worker-example created
machinedeployment.cluster.x-k8s.io/capi-worker-example createdNow you can watch the worker nodes being created by running the following command:
kubectl get machine -w
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
capi-worker-example-fggqs-4dhgp capi-example Provisioning 47s v1.29.6
capi-worker-example-fggqs-cgn8q capi-example Provisioning 47s v1.29.6
capi-worker-example-fggqs-xvkb9 capi-example Provisioning 47s v1.29.6It will take some time for the worker nodes to be created. Once they are created, you can check the status of the worker nodes by running the following command (don’t forget to export the KUBECONFIG variable):
kubectl get nodes
NAME STATUS ROLES AGE VERSION
capi-example-fwvqv Ready control-plane 127m v1.29.6
capi-example-t7vxk Ready control-plane 101m v1.29.6
capi-example-wn4xg Ready control-plane 97m v1.29.6
capi-worker-example-qk2kr-4bp8p Ready <none> 4m16s v1.29.6
capi-worker-example-qk2kr-fp2hc Ready <none> 11s v1.29.6
capi-worker-example-qk2kr-hkzft Ready <none> 2m16s v1.29.6It should show the worker nodes in the cluster 🚀.
Install NVIDIA GPU Operator
Add the NVIDIA GPU Operator Helm Repository
helm repo add nvidia https://helm.ngc.nvidia.com/nvidiaInstall the NVIDIA GPU Operator
helm upgrade --wait --install -n gpu-operator --create-namespace gpu-operator \
nvidia/gpu-operatorNow we can try to deploy a GPU workload to the cluster 🎉
Deploy a GPU Workload
Create a GPU Workload
apiVersion: v1
kind: Pod
metadata:
name: cuda-vectoradd
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vectoradd
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
resources:
limits:
nvidia.com/gpu: 1Now you can deploy the GPU workload by running the following command:
kubectl apply -f <file_name>.yaml
deployment.apps/gpu-workload createdAnd check the logs of the pod to ensure driver is loaded and the GPU is available:
kubectl logs -f gpu-workloadYou should see the output of the GPU workload in the logs 🍾.
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done