Incorrect Size

When the compute power requirements grow, the cluster grows in size to house the new containers. Of course,  as experienced cluster operators, while adding new workers, we also increase master nodes count. Everything works well until the Kubernetes cluster size expanded slightly over 1000-1500 nodes – and now everything fails. Kubectl does not work anymore, we can’t make any new changes – what has happened?

Let’s start with what is a change for Kubernetes and what actually happens when an event occurs. Kubectl contacts the kube-apiserver through API port and requests a change. Then the change is saved in a database and used by other APIs like kube-controller-manager or kube-scheduler. This gives us two quick leads – either there is a communication problem or the database does not work.

Let’s quickly check the connection to the API with curl (curl https://[KUBERNETES_MASTE_HOST]/api/) – it works. Well, that was too easy.

#kubernetes #containers #cloud native #ndots

Common Kubernetes Failures at Scale
2.75 GEEK