Blog

No Restarts, No Disruptions: Seamless Pod Resource updates with In-Place Resizing

Optimizing resource utilization while maintaining application performance is a never-ending challenge in Kubernetes. Figuring out how much resources your app needs at the start is complex,and the traditional approach of resizing CPU and/or memory resources can be disruptive, requiring the recreating of pods and potentially impacting running workloads. This interruption can lead to service degradation, downtime, and operational headaches.

Many users have been eagerly anticipating the ability to resize Kubernetes pods without a restart, and the feature is now available in Alpha from Kubernetes v1.27. The feature is called InPlacePodVerticalScaling and the resources field in a pod's containers now allows mutation for cpu and memory resources. They can be changed simply by patching the running pod spec.

Advantages of in-place pod resource resizing:

Reduced Downtime: Eliminates the downtime and potential data loss caused by pod restart, ensuring smooth operations and uninterrupted service for your users.
Enhanced Efficiency: Right-sizing your pods is crucial for optimal resource utilization. InPlacePodVerticalScaling lets you allocate resources precisely as needed, avoiding both overprovisioning (wasting money) and underprovisioning (hampering performance).
Improved Agility: Dynamic scaling allows you to respond instantly to changing demands. Whether it's a sudden surge in traffic or a scheduled batch job, your pods can adjust their resource usage seamlessly, ensuring optimal performance and responsiveness.
Cost Savings: By avoiding overprovisioning and optimizing resource usage, InPlacePodVerticalScaling translates directly to cost savings, especially in cloud environments where you pay per resource unit.
Simplified Management: Managing complex deployments is challenging, but InPlacePodVerticalScaling streamlines the process by eliminating manual restarts and offering an innovative approach to resource management.

In this blog post, I will show you how to try in-place pod resource resizing. The feature is still in Alpha as of kubernetes v1.29 and is not recommended for production.

In-place pod resource resize in action

The InPlacePodVerticalScaling feature is not enabled by default as it is still in its Alpha version and requires a Kubernetes cluster with version v1.27 or above. To activate the feature, you can either launch a minikube cluster using the following command or use a GKE alpha cluster if you're using Google Cloud.

minikube start --feature-gates=InPlacePodVerticalScaling=true

Let's deploy a sample pod to the cluster, and the new restartPolicy in the pod spec gives control to users over how their containers are handled when resources are resized.

In the below sample pod configuration for memory resources, the resizePolicy indicates that changes to the memory allocation require a restart of the container, and for CPU resources a restart is not necessary during resizing.

The decision to restart a container depends on whether the application can use the updated resource without requiring a restart or not. For example, if an application's memory usage is critical to its operation, restarting the container when memory changes occur ensures that the application starts with the correct amount of memory. This step helps prevent potential issues or malfunctions.

cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
    resizePolicy:
    - resourceName: "memory"
      restartPolicy: "RestartContainer"
    - resourceName: "cpu"
      restartPolicy: "NotRequired"
    resources:
      limits:
        cpu: "300m"
        memory: "1Gi"
      requests:
        cpu: "100m"
        memory: "500Mi"
EOF

Wait until the pod is moved to a running state and explore the pod configuration. A new field allocatedResources has been added to containerStatuses in the pod's status, and this field reflects the current node resources allocated to the pod's containers.

In addition, a new field called resources has been added to the container's status, and this field reflects the actual resource requests and limits configured on the running containers as reported by the container runtime.

CPU resize

let's adjust the CPU resources of the pod with the following patch command and observe the resize operation.

kubectl patch pod nginx --patch '{"spec": {"containers": [{"name":"nginx", "resources":{"requests": {"cpu" :"300m"},"limits": {"cpu" :"500m"}}}]}}'

You will now notice a field named resize is added to the pod's status to show the status of the last requested resize. This field reflects one of the below statuses based on the operation,

Proposed is an acknowledgement of the requested resize and indicates that the request was validated and recorded.
InProgress indicates that the node has accepted the resize request and is in the process of applying the resize request to the pod's containers.
Deferredmeans that the requested resize cannot be granted at this time, and the node will keep retrying. The resize may be granted when other pods leave and free up node resources.
Infeasible is a signal that the node cannot accommodate the requested resize. This can happen if the requested resize exceeds the maximum resources the node can ever allocate for a pod.

A known issue exists in the alpha stage where resizing a pod may experience a race condition with other pod updates. This can cause a delay in the activation of the pod to resize, and the updated container resources may take some time to be reflected in the pod's status.

Memory resize

let's continue with the Memory resource adjustments, and the container will be restarted as per the restartPolicy .

kubectl patch pod nginx --patch '{"spec": {"containers": [{"name":"nginx", "resources":{"requests": {"memory" :"700Mi"}}}]}}'

The screenshot displays the successful completion of the resize operation and container restart 🚀.

Conclusion

Although the feature is currently in Alpha, it has immense potential and is helpful for stateful applications that require vertical pod autoscaling. Follow the steps outlined in this blog post to try it out and experience its benefits firsthand.

I hope this blog post has been helpful. For more information, please refer to the following resources:

Subscribe to updates, news and more.

Related blogs

BigQuery time travel and fail-safe storage: Pitfalls and how to handle them

Switching to BigQuery’s physical storage can save money, but be aware of time travel and fail-safe storage costs. We cover how these features can inflate your bill and offers options to avoid that, such as reducing the time travel window or switching to logical storage before deleting data.

Keep reading