In this lecture we will look at Stateful Sets in Kubernetes.
Before we talk about stateful sets, we must first understand why we need it. Why can’t we just live with Deployments?
Let’s start from the very basics. So for a minute let’s keep aside everything that we learned so far, such as Deployments, or Kubernetes, or Docker or containers or virtual machines.
Let’s just start with a simple server. Our good old physical server. And we are tasked to deploy a database server. So we install and setup MySQL on the server and create a database. Our database is now operational. Other applications can now write data to our database.
To withstand failures we are tasked to deploy a High Availability solution. So we deploy additional servers and install MySQL on those as well. We have a blank database on the new servers.
So how do we replicate the data from the original database to the new databases on the new servers. Before we get into that,
So back to our question on how do we replicate the database to the databases in the new server.
There are different topologies available.
The most straight forward one being a single master multi slave topology, where all writes come in to the master server and reads can be served by either the master or any of the slaves servers.
So the master server should be setup first, before deploying the slaves.
Once the slaves are deployed, perform an initial clone of the database from the master server to the first slave. After the initial copy enable continuous replication from the master to that slave so that the database on the slave node is always in sync with the database on the master.
Note that both these slaves are configured with the address of the master host. When replication is initialized you point the slave to the master with the master’s hostname or address. That way the slaves know where the master is.
Let us now go back to the world of Kubernetes and containers and try to deploy this setup.
In the Kubernetes world, each of these instances, including the master and slaves are a POD part of a deployment.
In step 1, we want the master to come up first and then the slaves. And in case of the slaves we want slave 1 to come up first, perform initial clone of data from the master, and in step 4, we want slave 2 to come up next and clone data from slave 1.
With deployments you can’t guarantee that order. With deployments all pods part of the deployment come up at the same time.
So the first step can’t be implemented with a Deployment.
As we have seen while working with deployments the pods come up with random names. So that won’t help us here. Even if we decide to designate one of these pods as the master, and use it’s name to configure the slaves, if that POD crashes and the deployment creates a new pod in it’s place, it’s going to come up with a completely new pod name. And now the slaves are pointing to an address that does not exist. And because of all of these, the remaining steps can’t be executed.
And that’s where we need something new.
And that’s where stateful sets come into play. Stateful sets are similar to deployment sets, as in they create PODs based on a template. But with some differences. With stateful sets, pods are created in a sequential order. After the first pod is deployed, it must be in a running and ready state before the next pod is deployed.
So that helps us ensure master is deployed first and then slave 1 and then slave 2. And that helps us with steps 1 and 4.
Stateful sets assign a unique ordinal index to each POD – a number starting from 0 for the first pod and increments by 1.
Each Pod gets a name derived from this index, combined with the stateful set name. So the first pod gets mysql-0, the second pod mysql-1 and third mysql-2. SO no more random names. You can rely on these names going forward. We can designate the pod with the name mysql-0 as the master, and any other pods as slaves. Pod mysql-2 knows that it has to perform an initial clone of data from the pod mysql-1. If you scale up by deploying another pod, mysql-3, then it would know that it can perform a clone from mysql-2.
To enable continuous replication, you can now point the slaves to the master at mysql-0.
Even if the master fails, and the pod is recreated is created, it would still come up with the same name. Stateful sets maintain a sticky identity for each of their pods. And these help with the remaining steps. The master is now always the master and available at the address mysql-0.
And that is why you need stateful sets. In the upcoming lectures we will talk more about creating stateful sets, headless services, persistent volumes and more.
#Kubernetes #developer