An illustrated guide to Kubernetes Networking

An illustrated guide to Kubernetes Networking

Everything I learned about the Kubernetes Networking

Everything I learned about the Kubernetes Networking

An illustrated guide to Kubernetes Networking [Part 1]

You’ve been running a bunch of services on a Kubernetes cluster and reaping the benefits. Or at least, you’re planning to. Even though there are a bunch of tools available to setup and manage a cluster, you’ve still wondered how it all works under the hood. And where do you look if it breaks? I know I did.

Sure Kubernetes is simple enough to start using it. But let’s face it — it’s a complex beast under the hood. There are a lot of moving parts, and knowing how they all fit in and work together is a must, if you want to be ready for failures. One of the most complex, and probably the most critical parts is the Networking.

So I set out to understand exactly how the Networking in Kubernetes works. I read the docs, watched some talks, even browsed the codebase. And here is what I found out.

Kubernetes Networking Model

At it’s core, Kubernetes Networking has one important fundamental design philosophy:

Every Pod has a unique IP.
This Pod IP is shared by all the containers in this Pod, and it’s routable from all the other Pods. Ever notice some “pause” containers running on your Kubernetes nodes? They are called “sandbox containers”, whose only job is to reserve and hold a network namespace (netns) which is shared by all the containers in a pod. This way, a pod IP doesn’t change even if a container dies and a new one in created in it’s place. A huge benefit of this IP-per-pod model is there are no IP or port collisions with the underlying host. And we don’t have to worry about what port the applications use.

With this in place, the only requirement Kubernetes has is that these Pod IPs are routable/accessible from all the other pods, regardless of what node they’re on.

Intra-node communication

The first step is to make sure pods on the same node are able to talk to each other. The idea is then extended to communication across nodes, to the internet and so on.

On every Kubernetes node, which is a linux machine in this case, there’s a root network namespace (root as in base, not the superuser) — root netns.

The main network interface eth0is in this root netns.

Similarly, each pod has its own netns, with a virtual ethernet pair connecting it to the root netns. This is basically a pipe-pair with one end in root netns, and other in the pod netns.

We name the pod-end eth0, so the pod doesn’t know about the underlying host and thinks that it has its own root network setup. The other end is named something like vethxxx.

You may list all these interfaces on your node using ifconfig or ip a commands.

This is done for all the pods on the node. For these pods to talk to each other, a linux ethernet bridge cbr0 is used. Docker uses a similar bridge named docker0.

You may list the bridges using brctl show command.

Assume a packet is going from pod1 to pod2

1. It leaves pod1’s netns at eth0 and enters the root netns at vethxxx.

2. It’s passed on to cbr0, which discovers the destination using an ARP request, saying “who has this IP?”

3. vethyyy says it has that IP, so the bridge knows where to forward the packet.

4. The packet reaches vethyyy, crosses the pipe-pair and reaches pod2’s netns.

This is how containers on a node talk to each other. Obviously there are other ways, but this is probably the easiest, and what docker uses as well.

Inter-node communication

As I mentioned earlier, pods need to be reachable across nodes as well. Kubernetes doesn’t care how it’s done. We can use L2 (ARP across nodes), L3 (IP routing across nodes — like the cloud provider route tables), overlay networks, or even carrier pigeons. It doesn’t matter as long as the traffic can reach the desired pod on another node. Every node is assigned a unique CIDR block (a range of IP addresses) for pod IPs, so each pod has a unique IP that doesn’t conflict with pods on another node.

In most of the cases, especially in cloud environments, the cloud provider route tables make sure the packets reach the correct destination. The same thing could be accomplished by setting up correct routes on every node. There are a bunch of other network plugins that do their own thing.

Here we have two nodes, similar to what we saw earlier. Each node has various network namespaces, network interfaces and a bridge.

Assume a packet is going from pod1 to pod4 (on a different node).

  1. It leaves pod1’s netns at eth0 and enters the root netns at vethxxx.
  2. It’s passed on to cbr0, which makes the ARP request to find the destination.
  3. It comes out of cbr0 to the main network interface eth0 since nobody on this node has the IP address for pod4.
  4. It leaves the machine node1 onto the wire with src=pod1 and dst=pod4.
  5. The route table has routes setup for each of the node CIDR blocks, and it routes the packet to the node whose CIDR block contains the pod4 IP.
  6. So the packet arrives at node2 at the main network interface eth0.
  7. Now even though pod4 isn’t the IP of eth0, the packet is still forwarded to cbr0 since the nodes are configured with IP forwarding enabled.
  8. The node’s routing table is looked up for any routes matching the pod4 IP. It finds cbr0 as the destination for this node’s CIDR block.
  9. You may list the node route table using route -n command, which will show a route for cbr0 like this:

  1. The bridge takes the packet, makes an ARP request and finds out that the IP belongs to vethyyy.

  2. The packet crosses the pipe-pair and reaches pod4 🏠

An illustrated guide to Kubernetes Networking [Part 2]

We’ll expand on these ideas and see how the overlay networks work. We will also understand how the ever-changing pods are abstracted away from apps running in Kubernetes and handled behind the scenes.

Overlay networks

Overlay networks are not required by default, however, they help in specific situations. Like when we don’t have enough IP space, or network can’t handle the extra routes. Or maybe when we want some extra management features the overlays provide. One commonly seen case is when there’s a limit of how many routes the cloud provider route tables can handle. For example, AWS route tables support up to 50 routes without impacting network performance. So if we have more than 50 Kubernetes nodes, AWS route table won’t be enough. In such cases, using an overlay network helps.

It is essentially encapsulating a packet-in-packet which traverses the native network across nodes. You may not want to use an overlay network since it may cause some latency and complexity overhead due to encapsulation-decapsulation of all the packets. It’s often not needed, so we should use it only when we know why we need it.

To understand how traffic flows in an overlay network, let’s consider an example of flannel, which is an open-source project by CoreOS.

Here we see that it’s the same setup as before, but with a new virtual ethernet device called flannel0 added to root netns. It’s an implementation of Virtual Extensible LAN (VXLAN), but to linux, its just another network interface.

The flow for a packet going from pod1 to pod4 (on a different node) is something like this:

  1. The packet leaves pod1’s netns at eth0 and enters the root netns at vethxxx.

  2. It’s passed on to cbr0, which makes the ARP request to find the destination.

3a. Since nobody on this node has the IP address for pod4, bridge sends it to flannel0 because the node’s route table is configured with flannel0 as the target for the pod network range .

3b. As the flanneld daemon talks to the Kubernetes apiserver or the underlying etcd, it knows about all the pod IPs, and what nodes they’re on. So flannel creates the mappings (in userspace) for pods IPs to node IPs.

flannel0 takes this packet and wraps it in a UDP packet with extra headers changing the source and destinations IPs to the respective nodes, and sends it to a special vxlan port (generally 8472).

Even though the mapping is in userspace, the actual encapsulation and data flow happens in kernel space. So it happens pretty fast.

3c. The encapsulated packet is sent out via eth0 since it is involved in routing the node traffic.

  1. The packet leaves the node with node IPs as source and destination.

  2. The cloud provider route table already knows how to route traffic between nodes, so it send the packet to destination node2.

6a. The packet arrives at eth0 of node2. Due to the port being special vxlan port, kernel sends the packet to flannel0.

6b. flannel0 de-capsulates and emits it back in the root network namespace.

6c. Since IP forwarding is enabled, kernel forwards it to cbr0 as per the route tables.

  1. The bridge takes the packet, makes an ARP request and finds out that the IP belongs to vethyyy.

  2. The packet crosses the pipe-pair and reaches pod4 🏠

There could be slight differences among different implementations, but this is how overlay networks in Kubernetes work. There’s a common misconception that we have to use overlays when using Kubernetes. The truth is, it completely depends on the specific scenarios. So make sure you use it only when it’s absolutely needed.

An illustrated guide to Kubernetes Networking [Part 3]

Cluster dynamics

Due to the every-changing dynamic nature of Kubernetes, and distributed systems in general, the pods (and consequently their IPs) change all the time. Reasons could range from desired rolling updates and scaling events to unpredictable pod or node crashes. This makes the Pod IPs unreliable for using directly for communications.

Enter Kubernetes Services — a virtual IP with a group of Pod IPs as endpoints (identified via label selectors). These act as a virtual load balancer, whose IP stays the same while the backend Pod IPs may keep changing.

The whole virtual IP implementation is actually iptables (the recent versions have an option of using IPVS, but that’s another discussion) rules, that are managed by the Kubernetes component — kube-proxy. This name is actually misleading now. It used to work as a proxy pre-v1.0 days, which turned out to be pretty resource intensive and slower due to constant copying between kernel space and user space. Now, it’s just a controller, like many other controllers in Kubernetes, that watches the api server for endpoints changes and updates the iptables rules accordingly.

Due to these iptables rules, whenever a packet is destined for a service IP, it’s DNATed (DNAT=Destination Network Address Translation), meaning the destination IP is changed from service IP to one of the endpoints — pod IP — chosen at random by iptables. This makes sure the load is evenly distributed among the backend pods.

When this DNAT happens, this info is stored in conntrack — the Linux connection tracking table (stores 5-tuple translations iptables has done: protocol, srcIP, srcPort, dstIP, dstPort). This is so that when a reply comes back, it can un-DNAT, meaning change the source IP from the Pod IP to the Service IP. This way, the client is unaware of how the packet flow is handled behind the scenes.

So by using Kubernetes services, we can use same ports without any conflicts (since we can remap ports to endpoints). This makes service discovery super easy. We can just use the internal DNS and hard-code the service hostnames. We can even use the service host and port environment variables preset by Kubernetes.

Protip: Take this second approach and save a lot of unnecessary DNS calls!

Outbound traffic

The Kubernetes services we’ve talked about so far work within a cluster. However, in most of the practical cases, applications need to access some external api/website.

Generally, nodes can have both private and public IPs. For internet access, there is some sort of 1:1 NAT of these public and private IPs, especially in cloud environments.

For normal communication from node to some external IP, source IP is changed from node’s private IP to it’s public IP for outbound packets and reversed for reply inbound packets. However, when connection to an external IP is initiated by a Pod, the source IP is the Pod IP, which the cloud provider’s NAT mechanism doesn’t know about. It will just drop packets with source IPs other than the node IPs.

So we use, you guessed it, some more iptables! These rules, also added by kube-proxy, do the SNAT (Source Network Address Translation) aka IP MASQUERADE. This tells the kernel to use IP of the interface this packet is going out from, in place of the source Pod IP. A conntrack entry is also kept to un-SNAT the reply.

Inbound traffic

Everything’s good so far. Pods can talk to each other, and to the internet. But we’re still missing a key piece — serving the user request traffic. As of now, there are two main ways to do this:

NodePort/Cloud Loadbalancer (L4 — IP and Port) Setting the service type to NodePort assigns the service a nodePort in range 30000-33000. This nodePort is open on every node, even if there’s no pod running on a particular node. Inbound traffic on this NodePort would be sent to one of the pods (it may even be on some other node!) using, again, iptables.

A service type of LoadBalancer in cloud environments would create a cloud load balancer (ELB, for example) in front of all the nodes, hitting the same nodePort.

Ingress (L7 — HTTP/TCP)

A bunch of different implements, like nginx, traefik, haproxy, etc., keep a mapping of http hostnames/paths and the respective backends. This is entry point of the traffic over a load balancer and nodeport as usual, but the advantage is that we can have one ingress handling inbound traffic for all the services instead of requiring multiple nodePorts and load balancers.

Network Policy

Think of this like security groups/ACLs for pods. The NetworkPolicy rules allow/deny traffic across pods. The exact implementation depends on the network layer/CNI, but most of them just use iptables.

That’s all for now. In the previous parts we studied the foundation of Kubernetes Networking and how overlays work. Now we know how the Service abstraction helping in a dynamic cluster and makes discovery super easy. We also covered how the outbound and inbound traffic flow works and how network policy is useful for security within a cluster.

Learn More

Learn Kubernetes from a DevOps guru (Kubernetes + Docker)

Learn DevOps: The Complete Kubernetes Course

Kubernetes for the Absolute Beginners - Hands-on

Complete DevOps Gitlab & Kubernetes: Best Practices Bootcamp

Learn DevOps: On-Prem or Cloud Agnostic Kubernetes

Master Jenkins CI For DevOps and Developers

Docker Technologies for DevOps and Developers

DevOps Toolkit: Learn Kubernetes with Practical Exercises!

What is Kubernetes | Kubernetes Tutorial For Beginners

What is Kubernetes | Kubernetes Tutorial For Beginners

This video on "What is Kubernetes | Kubernetes Tutorial For Beginners" will give you an introduction to one of the most popular Devops tool in the market - Kubernetes, and its importance in today's IT processes. This tutorial is ideal for beginners who want to get started with Kubernetes & DevOps

What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginners

The following topics are covered in this training session:

  1. Need for Kubernetes
  2. What is Kubernetes and What it's not
  3. How does Kubernetes work?
  4. Use-Case: Kubernetes @ Pokemon Go
  5. Hands-on: Deployment with Kubernetes

Building and Managing Kubernetes with Kubernetes

Building and Managing Kubernetes with Kubernetes

Building and Managing Kubernetes with Kubernetes: Kubernetes as a declarative and portable system can be used to do many things in different ways.

Kubernetes as a declarative and portable system can be used to do many things in different ways.

At eBay we built a fleet management system based on k8s. Everything(server, subnet, OS, package and state) is declarative and can be modeled as CRDs in k8s, or referred to as a commit id in git from the objects. By running various controllers on top of these CRD objects, we use k8s to manage k8s, and the entire eBay data center. - Our system provisions hosts the same way k8s creates and manages pods. - We build k8s clusters with Salt. each host has a set of states defined in its salt CRD object. controllers pull states from git based on commit ids to apply. - We build both schedulers and deployment transactions to manage the k8s clusters for both config deployments and upgrades. This declarative, highly scalable, auto healing, and cloud native system is what we think can unify eBay’s fleet.

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading about Kubernetes

An illustrated guide to Kubernetes Networking

Google Kubernetes Engine By Example

An Introduction to the Kubernetes DNS Service

Deploying a Laravel app in Kubernetes on Google Cloud

How to build a Microservice Architecture with Spring Boot and Kubernetes?

Kubernetes Fundamentals - Learn Kubernetes from the Beginning

Kubernetes Fundamentals - Learn Kubernetes from the Beginning

Kubernetes is about orchestrating containerized apps. Docker is great for your first few containers. As soon as you need to run on multiple machines and need to scale/up down and distribute the load and so on, you need an orchestrator - you need Kubernetes

Kubernetes is about orchestrating containerized apps. Docker is great for your first few containers. As soon as you need to run on multiple machines and need to scale/up down and distribute the load and so on, you need an orchestrator - you need Kubernetes

This is the first part of a series of articles on Kubernetes, cause this topic is BIG!.

  • Part I - From the beginning, Part I, Basics, Deployment and Minikube: In this part, we cover why Kubernetes, some history and some basic concepts like deploying, Nodes, Pods.
  • Part II - Introducing Services and Labeling: In this part, we deepen our knowledge of Pods and Nodes. We also introduce Services and labeling using labels to query our artifacts.
  • Part III - Scaling: Here we cover how to scale our app
  • Part IV - Auto scaling: In this part we look at how to set up auto-scaling so we can handle sudden large increases of incoming requests
Resources Kubernetes

So what do we know about Kubernetes?

It's an open-source system for automating deployment, scaling, and management of containerized applications

Let'start with the name. It's Greek for Helmsman, the person who steers the ship. Which is why the logo looks like this, a steering wheel on a boat:

It's Also called K8s so K ubernete s, 8 characters in the middle are removed. Now you can impress your friends that you know why it's referred to as K8.

Here is some more Jeopardy knowledge on its origin. Kubernetes was born out of systems called Borg and Omega. It was donated to CNCF, Cloud Native Computing Foundation in 2014. It's written in Go/Golang.

If we see past all this trivia knowledge, it was built by Google as a response to their own experience handling a ton of containers. It's also Open Source and battle-tested to handle really large systems, like planet-scale large systems.

So the sales pitch is:

Run billions of containers a week, Kubernetes can scale without increasing your ops team

Sounds amazing right, billions of containers cause we are all Google size. No? :) Well even if you have something like 10-100 containers, it's for you.

In this part I hope to cover the following:

  • Why Kubernetes and Orchestration in General
  • Hello world: Minikube basics, talking through Minikube, simple deploy example
  • Cluster and basic commands, Nodes,
  • Deployments, what it is and deploying an app
  • Pods and Nodes, explain concepts and troubleshooting
Part I - From the beginning, Part I, Basics, Deployment and Minikube Why Orchestration

Well, it all started with containers. Containers gave us the ability to create repeatable environments so dev, staging, and prod all looked and functioned the same way. We got predictability and they were also light-weight as they drew resources from the host operating system. Such a great breakthrough for Developers and Ops but the Container API is really only good for managing a few containers at a time. Larger systems might consist of 100s or 1000+ containers and needs to be managed as well so we can do things like scheduling, load balancing, distribution and more.

At this point, we need orchestration the ability for a system to handle all these container instances. This is where Kubernetes comes in.

Getting started

Ok ok, let's say I buy into all of this, how do I get started?

Impatient ey, sure let's start to do something practical with Minikube

Ok, sounds good I'm a coder, I like practical stuff. What is Minikube?

Minikube is a tool that lets us run Kubernetes locally

Oh, sweet, millions of containers on my little machine?

Well, no, let's start with a few and learn Kubernetes basics while at it.

Installation

To install Minikube lets go to this installation page

It's just a few short steps that means we install

  • a Hypervisor
  • Kubectl (Kube control tool)
  • Minikube

Run

Get that thing up and running by typing:

minikube start

It should look something like this:

You can also ensure that kubectl have been correctly installed and running:

kubectl version

Should give you something like this in response:

Ok, now we are ready to learn Kubernetes.

Learning kubectl and basic concepts

In learning Kubernetes lets do so by learning more about kubectl a command line program that lets us interact with our Cluster and lets us deploy and manage applications on said Cluster.

The word Cluster just means a group of similar things but in the context of Kubernetes, it means a Master and multiple worker machines called Nodes. Nodes were historically called Minions

, but not so anymore.

The master decides what will run on the Nodes, which includes things like scheduled workloads or containerized apps. Which brings us to our next command:

kubectl get nodes

This should give us a result like this:

What this tells us what Nodes we have available to do work.

Next up let's try to run our first app on Kubernetes with the run command like so:

kubectl run kubernetes-first-app --image=gcr.io/google-samples/kubernetes-bootcamp:v1 --port=8080

This should give us a response like so:

Next up lets check that everything is up and running with the command:

kubectl get deployments

This shows the following in the terminal:

In putting our app on the Kluster, by invoking the run command, Kubernetes performed a few things behind the scenes, it:

  • searched for a suitable node where an instance of the application could be run, there was only one node so it got chosen
  • scheduled the application to run on that Node
  • configured the cluster to reschedule the instance on a new Node when needed

Next up we are going to introduce the concept Pod, so what is a Pod?

A Pod is the smallest deployable unit and consists of one or many containers, for example, Docker containers. That's all we are going to say about Pods at the moment but if you really really want to know more have a read here

The reason for mentioning Pods at this point is that our container and app is placed inside of a Pod. Furthermore, Pods runs in a private isolated network that, although visible from other Pods and services, it cannot be accessed outside the network. Which means we can't reach our app with say a curl command.

We can change that though. There is more than one way to expose our application to the outside world for now however we will use a proxy.

Now open up a 2nd terminal window and type:

kubectl proxy

This will expose the kubectl as an API that we can query with HTTP request. The result should look like:

Instead of typing kubectl version we can now type curl http://localhost:8001/version and get the same results:

The API Server inside of Kubernetes have created an endpoint for each pod by its pod name. So the next step is to find out the pod name:

kubectl get pods

This will list all the pods you have, it should just be one pod at this point and look something like this:

Then you can just save that down to a variable like so:

Lastly, we can now do an HTTP call to learn more about our pod:

curl http://localhost:8001/api/v1/namespaces/default/pods/$POD_NAME

This will give us a long JSON response back (I trimmed it a bit but it goes on and on...)

Maybe that's not super interesting for us as app developers. We want to know how our app is doing. Best way to know that is looking at the logs. Let's do that with this command:

kubectl logs $POD_NAME

As you can see below we know get logs from our app:

Now that we know the Pods name we can do all sorts of things like checking its environment variables or even step inside the container and look at the content.

kubectl exec $POD_NAME env

This yields the following result:

Now lets step inside the container:

kubectl exec -ti $POD_NAME bash

We are inside! This means we can see what the source code looks like even:

cat server.js

Inside of our container, we can now reach the running app by typing:

curl http://localhost:8080

Summary Part I

This is where we will stop for now.
What did we actually learn?

  • Kubernetes, its origin what it is
  • Orchestration why you will soon need it
  • Concepts like Master, Nodes and Pods
  • Minikube, kubectl and how to deploy an image onto our Cluster
Part II - Introducing Services and Labeling

In this part we will cover the following:

  • Deepen our knowledge on Pods and Nodes
  • Introduce Services and labeling
  • Perform an exercise, involving setting labels on Pods and use labels to query our artifacts
Concepts revisited

When we create a Deployment on Kubernetes, that Deployment creates Pods with containers inside them. So Pods are tied to Nodes and will continue to exist until terminated or deleted. Let's try to educate ourselves a bit more on Pods, Nodes and let's also introduce a new topic Services.

Pods

Pods are the atomic unit on the Kubernetes platform, i.e smallest possible deployable unit

We've stated the above before but it's worth mentioning again.

What else is there to know?

A Pod is an abstraction that represents a group of one or more containers, for example, Docker or rkt, and some shared resources for those containers. Those resources include:

  • Shared storage, as Volumes
  • Networking, as a unique cluster IP address
  • Information about how to run each container, such as the container image version or specific ports to use

A Pod can have more than one container. If it does contain more than one container it is so the other containers can support the primary application.
Typical examples of helper applications are data pullers, data pushers, and proxies. You can read more on that use case here

  1. The containers in a Pod share an IP Address and port space and are:
  2. Always co-located
  3. Co-scheduled

Let me show you an image to make it easier to visualize:

As we can see above a Pod can have a lot of different artifacts in them that are able to communicate and support the app in some way.

Nodes

A Pod always runs on a Node

So Node is the Pods parent?

Yes.

A Node is a worker machine and may be either a virtual or a physical machine, depending on the cluster

Each Node is managed by the Master. A Node can have multiple pods.

So it's a one to many relationship

The Kubernetes master automatically handles scheduling the pods across the Nodes in the cluster

Every Kubernetes Node runs at least a:

  • Kubelet, is responsible for the pod spec and talks to the cri interface

  • Kube proxy, is the main interface for coms between nodes

  • A container runtime, (like Docker, rkt) responsible for pulling the container image from a registry, unpacking the container, and running the application.

Ok so a Node contains a Kubelet and container runtime and one to many Pods. I think I got it.

Let's show an image to make this info stick, cause it's quite important that we know what goes on, at least at a high level:

Services

Pods are mortal, they can die. Pods, in fact, have a lifecycle.

When a worker node dies, the Pods running on the Node are also lost.

What happens to our apps? :(

You might think them and their data are lost but not so. The whole point with Kubernetes is to not let that happen. We normally deploy something like a ReplicaSet.

A ReplicaSet, what do you mean?

A ReplicaSet is a high-level artifact that can drive the cluster back to desired state via the creation of new Pods to keep your application running.

Ok so if a Pod goes down the ReplicaSet just creates a new Pod/s in its place?

Yes, exactly that. If you focus on defining a desired state the rest is up to Kubernetes.

Phew sounds really great then.

This concept of desired state is a very important one. You need to specify how many containers you want of each kind, at all times.

Oh so 4 database containers, 3 services etc?

Yes exactly.

So you don't have to care about the details just tell Kubernetes what state you want and it does the rest. If something goes up, Kubernetes ensures it comes back up again to desired state.

Each Pod in a Kubernetes cluster has a unique IP address, even Pods on the same Node, so there needs to be a way of automatically reconciling changes among Pods so that your applications continue to function.

Ok?

Yea, think like this. If a Pod containing your app goes down and another Pod is created in its place, running your app. Users should still be able to use your app after that.

Ok I got it. Makes me think...

The motivation for a Service

You should never refer to a Pod by it's IP address, just think what happens when a Pod goes down and comes back up again but this time with a different IP. It is for that reason a Service exists.

A Service in Kubernetes is an abstraction which defines a logical set of Pods and a policy by which to access them.

Makes me think of a routers and subnets

Yea I guess you can say there is a resemblance in there somewhere.

Services enable a loose coupling between dependent Pods and are defined using YAML or JSON file, just like all Kubernetes objects.

That's handy, just JSON and YAML :)

Services and Labels

The set of Pods targeted by a Service is usually determined by a LabelSelector.

Although each Pod has a unique IP address, those IPs are not exposed outside the cluster without a Service. We can expose them through a proxy though as we showed in part I.

Wait, go back a second here, you said LabelSelector. I wasn't quite following?

Remember how we couldn't refer to Pods by IP, cause Pods might go down and a new Pod could come back in its place?

Yes

Well, labels are the answer to how Services and Pods are able to communicate. This is what we mean by loose coupling. By applying labels like for example frontend, backend, release and so on to Pods, we are able to refer to Pods by their logical name rather than their specifics, i.e IP number.

Oh I get it, so it's a high-level domain language

Mm, kind of.

Services and Traffic

Services allow your applications to receive traffic.

Services can be exposed in different ways by specifying a type in ServiceSpec, service specification.

  • ClusterIP (default) - Exposes the Service on an internal IP in the cluster. This type makes the Service only reachable from within the cluster.
  • NodePort - Exposes the Service on the same port of each selected Node in the cluster using NAT. Makes a Service accessible from outside the cluster using :. Superset of ClusterIP.
  • LoadBalancer - Creates an external load balancer in the current cloud (if supported) and assigns a fixed, external IP to the Service. Superset of NodePort.
  • ExternalName - Exposes the Service using an arbitrary name (specified by externalName in the spec) by returning a CNAME record with the name. No proxy is used. This type requires v1.7 or higher of kube-dns.

Ok I think I get it. Ensure I'm speaking externally to a Service instead of specific Pods. Depending on what I expose the Service as, that leads to different behavior?

Yea that's correct.

You said something about labels though, how do we create and apply them to Pods?

Yea lets talk about that next.

Labels

As we just mentioned, Services are the abstraction that allows pods to die and replicate in Kubernetes without impacting your application.

Now, Services match a set of Pods using labels and selectors, it allows us to operate on Pods like a group.

Labels are key/value pairs attached to objects and can be used in any number of ways:

  • Designate objects for development, test, and production
  • Embed version tags
  • Classify an object using tags

Labels can be attached to objects at creation time or later on. They can be modified at any time.

Lab - Fun with Labels and kubectl

It's a good idea to have read the first part of this series where we create a deployment. If you haven't you need to first create a deployment like so:

kubectl run kubernetes-first-app --image=gcr.io/google-samples/kubernetes-bootcamp:v1 --port=8080

Now we should be good to go.

Ok. I know you are probably all tired from all theory by now.

I bet you are just itching to learn more hands on Kubernetes with kubectl.

Well, the time for that has come :). We will do two things:

  1. Create a Service and learn how we can expose our app using said Service
  2. Learn about Labeling and how we can improve our querying game by having appropriate labels on our artifacts.

Let's create a new service.

We will get acquainted with the expose command.

Let's check for existing pods,

kubectl get pods

Next let's see what services we have:

kubectl get services

Next lets create a Service like so:

kubectl expose deployment/kubernetes-first-app --type="NodePort" --port 8080

As you can see above we are just targeting one of our deployments kubernetes-first-app and referring to it with [type]/[deployment name] and type being deployment.

We expose it as service of type NodePort and finally, we choose to expose it at port 8080.

Now run kubectl get services again and see the results:

As you can see we now have two services in use, our basic kubernetes service and our newly created kubernetes-first-app.

Next up we need to grab the port of our service and assign that to a variable:

export NODE_PORT=$(kubectl get services/kubernetes-first-app -o go-template='{{(index .spec.ports 0).nodePort}}')
echo NODE_PORT=$NODE_PORT

We now have a our port stored on environment variable NODE_PORT and we are ready to start communicating with our service like so:

curl $(minikube ip):$NODE_PORT

Which leads to the following output:

Creating and applying Labels

When we created our deployment and our Pod, it was automatically assigned with a label.

By typing

kubectl describe deployment

we can see the name of said label.

Next up we can query the pods by that same label

kubectl get pods -l run=kubernetes-first-app

Above we are using -l to query for a specific label and kubernetes-bootcamp as the name of the label. This gives us the following result:

You can do a similar query to your services:

kubectl get services -l run=kubernetes-first-app

That just shows that you can query on different levels, for specific Pods or Services that have Pods with that label.

Next up we will look at how to change the label

First let's get the name of the pod, like so:

POD_NAME=kubernetes-first-app-669789f4f8-6glpx

Above I'm just assigning what my Pod is called to a variable POD_NAME. Check with a kubectl getpods what your Pod is called.

Then we can add/apply the new label like so:

kubectl label pod $POD_NAME app=v1

Verify that the new label have been set, like so:

kubectl describe pod

or

kubectl describe pods $POD_NAME

As you can see from the result our new label app=v1 has been appended to existing labels.

Now we can query like so:

kubectl get pods -l app=v1

That's pretty much how labeling works, how to get available labels, apply them and use them in a query. Ensure to give them a descriptive name like an app version, a certain environment or a name like frontend or backend, something that makes sense to your situation.

Clean up

Ok, so we created a service. We should learn how to clean up after ourselves. Run the following command to remove our service:

kubectl delete service -l run=kubernetes-bootcamp

Verify the service is no longer there with:

kubectl get services

also, ensure our exposed IP and port can no longer be reached:

curl $(minikube ip):$NODE_PORT

Just because the service is gone doesn't mean the app is gone. The app should still be reachable on:

kubectl exec -ti $POD_NAME curl localhost:8080

Summary Part II

So what did we learn? We learned a bit more on Pods and Nodes. Furthermore, we learned that we shouldn't speak directly to Pods but rather use a high-level abstraction such as Services. Services use labels as a way to define a domain language and apply those to different Pods.

Ok, so we understand a bit more on Kubernetes and how different concepts relate. We mentioned something called desired state a number of times but we didn't go into detail on how to set such a state. That's our next part in this series where we will cover how to set the desired state and how Kubernetes maintains it, so stay tuned.

Part III - Scaling

This third part aims to show how you scale your application. We can easily set the number of Replicas we want of a certain application and let Kubernetes figure out how to do that. This is us defining a so-called desired state.

When traffic increases, we will need to scale the application to keep up with user demand. We've talked about deployments and services, now lets talk scaling.

What does scaling mean in the context of Kubernetes?

We get more Pods. More Pods that are scheduled to nodes.

Now it's time to talk about desired state again, that we mentioned in previous parts.

This is where we relinquish control to Kubernetes. All we need to do is tell Kubernetes how many Pods we want and Kubernetes does the rest.

So we tell Kubernetes about the number of Pods we want, what does that mean? What does Kubernetes do for us?

It means we get multiple instances of our application. It also means traffic is being distributed to all of our Pods, ie. load balancing.

Furthermore, Kubernetes, or more specifically, services within Kubernetes will monitor which Pods are available and send traffic to those Pods.

Scaling demo Lab

If you haven't followed the first two parts I do recommend you go back and have a read. What you need for the following to work is at least a deployment. So if you haven't created one, here is how:

kubectl run kubernetes-first-app --image=gcr.io/google-samples/kubernetes-bootcamp:v1 --port=8080

Let's have a look at our deployments:

kubectl get deployments

Let's look closer at the response we get:

We have three pieces of information that are important to us. First, we have the READY column in which we should read the value in the following way, CURRENT STATE/DESIRED STATE. Next up is the UP_TO_DATE column which shows the number of replicas that were updated to match the desired state.
Lastly, we have the AVAILABLE column that shows how many replicas we have available to do work.

Let's scale

Now, let's do some scaling. For that we will use the scale command like so:

kubectl scale deployments/kubernetes-first-app --replicas=4 

as we can see above the number of replicas was increased to 4 and kubernetes is thereby ready to load balance any incoming requests.

Let's have a look at our Pods next:

When we asked for 4 replicas we got 4 Pods.

We can see that this scaling operation took place by using the describe command, like so:

kubectl describe deployments/kubernetes-first-app

In the above image, we are given quite a lot of information on our Replicas for example, but there is some other information in there that we will explain later on.

Does it load balance?

The whole point with the scaling was so that we could balance the load on incoming requests. That means that not the same Pod would handle all the requests but that different Pods would be hit.
We can easily try this out, now that we have scaled our app to contain 4 replicas of itself.

So far we used the describe command to describe the deployment but we can use it to describe the IP and port of. Once we have the IP and port we can then hit it with different HTTP requests.

kubectl describe services/kubernetes-first-app

Especially look at the NodePort and the Endpoints. NodePort is the port value that we want to hit with an HTTP request.

Now we will actually invoke the cURL command and ensure that it hits a different port each time and thereby prove our load balancing is working. Let's do the following:

NODE_PORT=30450

Next up the cURL call:

curl $(minikube ip):$NODE_PORT

As you can see above we are doing the call 4 times. Judging by the output and the name of the instance we see that we are hitting a different Pod for each request. Thereby we see that the load balancing is working.

Scaling down

So far we have scaled up. We managed to go from one Pod to 4 Pods thanks to the scale command. We can use the same command to scale down, like so:

kubectl scale deployments/kubernetes-first-app --replicas=2 

Now if we are really fast adding the next command we can see how the Pods are being removed as Kubernetes is trying to adjust to desired state.

2 out of 4 Pods are saying Terminating as only 2 Pods are needed to maintain the new desired state.

Running our command again we see that only 2 Pods remain and thereby our new desired state have been reached:

We can also look at our deployment to see that our scale instruction has been parsed correctly:

Self-healing

Self-healing is Kubernetes way of ensuring that the desired state is maintained. Pods don't self heal cause Pods can die. What happens is that a new Pod appears in its place, thanks to Kubernetes.

So how do we test this?

Glad you asked, we can delete a Pod and see what happens. So how do we do that? We use the delete command. We need to know the name of our Pod though so we need to call get pods for that. So let's start with that:

kubectl get pods

Then lets pick one of our two Pods kubernetes-first-app-669789f4f8-6glpx and assign it to a variable:

POD_NAME=kubernetes-first-app-669789f4f8-6glpx

Now remove it:

kubectl delete pods $POD_NAME

Let's be quick about it and check our Pod status with get pods. It should say Terminating like so:

Wait some time and then echo out our variable $POD_NAME followed by get pods. That should give you a result similar to the below.

So what does the above image tell us? It tells us that the Pod we deleted is truly deleted but it also tells us that the desired state of two replicas has been achieved by spinning up a new Pod. What we are seeing is * self-healing* at work.

Different ways to scale

Ok, we looked at a way to scale by explicitly saying how many replicas we want of a certain deployment. Sometimes, however, we might want a different way to scale namely auto-scaling. Auto-scaling is about you not having to set the exact number of replicas you want but rather rely on Kubernetes to create the number of replicas it thinks it needs. So how would Kubernetes know that? Well, it can look at more than one thing but a common metric is CPU utilization. So let's say you have a booking site and suddenly someone releases Bruce Springsteen tickets you are likely to want to rely on auto-scaling, cause the next day when the tickets are all sold out you want the number of Pods to go back to normal and you wouldn't want to do this manually.

Auto-scaling is a topic I plan to cover more in detail in a future article so if you are really curious how that is done I recommend you have a look here

Summary Part III

Ok. So we did it. We managed to scale an app by creating replicas of it. It wasn't so hard to accomplish. We showed how we only needed to provide Kubernetes with a desired state and it would do its utmost to preserve said state, also called * self-healing*. Furthermore, we mentioned that there was another way to scale, namely auto-scaling but decided to leave that topic for another article. Hopefully, you are now more in awe of how amazing Kubernetes is and how easy it is to scale your app.

Part IV - Auto scaling

In this article, we will cover the following:

  • Why auto scaling, we will discuss different scenarios in which it makes sense to rely on auto scaling over defining it statically like we do with desired state
  • How, lets talk about Horizontal Auto Scaling the concept/feature that allows us to scale in an elastic way.
  • Lab - lets scale, we will look at how to actually set this up in kubectl and simulate a ton of incoming requests. We will then inspect the results and see that Kubernetes acts the way we think
Why

So in our last part, we talked about desired state. That's an OK strategy until something unforeseen happens and suddenly you got a great influx of traffic. This is likely to happen to businesses such as e-commerce around a big sale or a ticket vendor when you release tickets to a popular event.

Events like these are an anomaly which forces you to quickly scale up. The other side of the coin though is that at some point you need to scale down or you suddenly have overcapacity you might need to pay for. What you really want is for the scaling to act in an elastic way so it scaled up when you need it to and scales down when there is less traffic.

How

Horizontal auto-scaling, what does it mean?

It's a concept in Kubernetes that can scale the number of Pods we need. It can do so on a replication controller, deployment or replica set. It usually looks at CPU utilization but can be made to look at other things by using something called custom metrics support, so it's customizable.

It consists of two parts a resource and a controller. The controller checks utilization, or whatever metric you decided, to ensure that the number of replicas matches your specification. If need be it spins up more Pods or removes them. The default is checking every 15 seconds but you can change that by looking at a flag called --horizontal-pod-autoscaler-sync-period.

The underlying algorithm that decides the number of replicas looks like this:

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

Lab - lets scale

Ok, the first thing we need to do is to scale our deployment to use something other than desired state.

We have two things we need to specify when we do autoscaling:

  • min/max, we define a minimum and maximum in terms of how many Pods we want
  • CPU, in this version we set a certain CPU utilization percentage. When it goes above that it scales out as needed. Think of this one as an IF clause, if CPU value greater than the threshold, try to scale

Set up

Before we can attempt our scaling experiment we need to make sure we have the correct add-ons enabled. You can easily see what add-ons you have enabled by typing:

minikube addons list

If it looks like the above we are all good. Why am I saying that? Well, what we need, to be able to auto-scale, is that heapster and metrics-server add ons are enabled.

Heapster enables Container Cluster Monitoring and Performance Analysis.

Metrics server provide metrics via the resource metrics API. Horizontal Pod Autoscaler uses this API to collect metrics

We can easily enable them both with the following commands (we will need to for auto-scaling to show correct data):

minikube addons enable heapster

and

minikube addons enable metrics-server

We need to do one more thing, namely to enable Custom metrics, which we do by starting minikube with such a flag like so:

minikube start --extra-config kubelet.EnableCustomMetrics=true

Ok, now we are good to go.

Running the experiment

We need to do the following to run our experiment

  • Create a deployment
  • Apply autoscaling
  • Bombard the deployment with incoming requests
  • Watch the auto scaling how it changes

Create a deployment

kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80

Above we are creating a deployment php-apache and expose it as a service on port 80. We can see that we are using the image k8s.gcr.io/hpa-example

It should tell us the following:

service/php-apache created
deployment.apps/php-apache created

Autoscaling

Next up we will use the command autoscale. We will use it like so:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

It should say something like:

horizontalpodautoscaler.autoscaling/php-apache autoscaled

Above we are applying the auto-scaling on the deployment php-apache and as you can see we are applying both min-max and cpu based auto scaling which means we give a rule for how the auto scaling should happen:

If CPU load is >= 50% create a new Pod, but only maximum 10 Pods. If the load is low go back gradually to one Pod

Bombard with requests

Next step is to send a ton of requests against our deployment and see our auto-scaling doing its work. So how do we do that?

First off let's check the current status of our horizontal pod auto-scaler or hpa for short by typing:

kubectl get hpa

This should give us something like this:

The above shows us two pieces of information. The first is the TARGETS column which shows our CPU utilization, actual usage/trigger value. The next bit of interest is the column REPLICAS that shows us the number of copies, which is 1 at the moment.

For our next trick open up a separate terminal tab. We need to do is to set things up so we can send a ton of requests.

Next up we create a container using this command:

kubectl run -i --tty load-generator --image=busybox /bin/sh

This should take us to a prompt within the container. This is followed by:

while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

The command above should result in something looking like this.

This will just go on and on until you hit CTRL+C, but leave it be for now.

This throws a ton on requests in while true loop.

I thought while true loops were bad?

They are but we are only going to run it for a minute so that the auto scaling can happen. Yes, the CPU will sound a lot but don't worry :)

Let this go on for a minute or so, then enter the following command into the first terminal tab (not the one running the requests), like so:

kubectl get hpa

It should now show something like this:

As you can see from the above the column TARGETS looks different and now says 339%/50% which means the current load on the CPU and REPLICAS is 7 which means it has gone from 1 to 7 replicas. So as you can see we have been bombarding it pretty hard.

Now go to the second terminal and hit CTRL+C or you will have a situation like this:

It will actually take a few minutes for Kubernetes to cool off and get the values back to normal. A first look at the Pods situation shows us the following:

kubectl get pods

As you can see we have 7 Pods up and running still but let's wait a minute or two and it should look like this:

Ok, now we are back to normal.

Summary Part IV

Now we did some great stuff in this article. We managed to set up auto-scaling, bombard it with requests and without frying our CPU, hopefully ;)

We also managed to learn some new Kubernetes commands while at it and got to see auto-scaling at work giving us new Pods based on our specification.