Understanding Kubernetes Autoscaling

Kubernetes provides a series of features to ensure your clusters have the right size to handle any type of load. In this blog post, we will look into the different auto-scaling tools provided by Kubernetes and learn the difference between the horizontal pod autoscaler, the vertical pod autoscaler and Kubernetes Nodes autoscaler.

Developers use Kubernetes to ship faster to their users and respond to their requests as quickly as possible. You design the capacity of your cluster on the estimated load your users will generate on it. But imagine your service went viral, and the number of requests grows faster than you ever imagined. You risk running out of compute resources, your service might slow down, and users may get frustrated.

When you allocate resources manually, your responses may not be as quick as required by your application’s changing needs. This is were Kubernetes Autoscaling comes in: Kubernetes provides multiple layers of autoscaling functionality: Pod-based scaling with the Horizontal Pod Autoscaler and the Vertical Pod Autoscaler, as well as node-based with the Cluster Autoscaler. It automatically scales up your cluster as soon as you need it and scales it back down to its regular size when the load is lower. These layers ensure that each pod and cluster has the right performance to serve your current needs.

Kubernetes Architecture

In Kubernetes, a set of machines for running containerized applications is called Cluster. A cluster contains, at minimum, a Control Plane and one or several Nodes. The control plane maintains the clusters’ desired state, such as which applications run on them and which images they use. The nodes are either virtual or physical machines that run the applications and workloads, called Pods. Pods consist of containers that request compute resources such as CPU, Memory, or GPU.

Image for post

Horizontal vs. Vertical Scaling

Image for post

Horizontal Scaling means modifying the compute resources of an existing cluster, for example, by adding new nodes to it or by adding new pods by increasing the replica count of pods (Horizontal Pod Autoscaler).
Vertical Scaling means to modify the attributed resources (like CPU or RAM) of each node in the cluster. In most cases, this means creating an entirely new node pool using machines that have different hardware configurations. Vertical scaling on pods means dynamically adjusting the resource requests and limits based on the current application requirements (Vertical Pod Autoscaler).

Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) is able to scale the number of pods available in a cluster to handle the current computational workload requirements of an application. It determines the number of pods needed based on metrics set by you and applies the creation or deletion of pods based on threshold sets. In most cases, these metrics are CPU and RAM usage, but it is also possible to specify your custom metrics. The HPA checks continuously the CPU and memory metrics generated by the metrics-server installed in the Kubernetes cluster.

If one of the specified thresholds is met, it updates the number of pod replicas inside the deployment controller. Following the updated number of pod replicas, the deployment controller will scale up or down the number of pods until the number of replicas matches the desired number. In case you want to use custom metrics to define rules on how the HPA handles scaling your pods, your cluster needs to be linked to a time-series database holding the metrics you want to use. Please note that Horizontal Pod Autoscaling can not be applied to objects that can not be scaled like, for example, DaemonSets.

Vertical Pod Autoscaler

The Vertical Pod Autoscaler (VPA) can allocate more (or less) CPU and memory resources to existing pods to modify the available computing resources for an application. This feature can be useful to monitor and adjust the allocated resources of each pod over its lifetime. The VPA comes with a tool called VPA Recommender, which monitors the current and past resource consumption and uses this data to provide recommended CPU and memory resources to be allocated for the containers. The Vertical Pod Autoscaler does not update resource configurations for existing pods. It checks which pods have the correct resource configuration and kills the ones that are not having the recommended configuration so that their controllers can recreate them with the updated configuration.

When you want to use the HPA and VPA both at the same time to manage your container resources, you may put them in a conflict which each other when using the same metrics (CPU and memory). Both of them will try to solve the situation simultaneously, resulting in a wrong allocation of resources. However, it is possible to use them both if they rely on different metrics. The VPA uses CPU and memory consumption as unique sources to gather the perfect resource allocation, but the HPA can be used with custom metrics so both tools can be used in parallel.

#kubernetes #devops

Kubernetes Architecture

Horizontal vs. Vertical Scaling

Horizontal Pod Autoscaler

Vertical Pod Autoscaler

blog.scaleway.com

Understanding Kubernetes Autoscaling