Everything You Need to Know About Kubernetes Networking

Everything I learned about the Kubernetes Networking

An illustrated guide to Kubernetes Networking [Part 1]

You’ve been running a bunch of services on a Kubernetes cluster and reaping the benefits. Or at least, you’re planning to. Even though there are a bunch of tools available to setup and manage a cluster, you’ve still wondered how it all works under the hood. And where do you look if it breaks? I know I did.

Sure Kubernetes is simple enough to start using it. But let’s face it — it’s a complex beast under the hood. There are a lot of moving parts, and knowing how they all fit in and work together is a must, if you want to be ready for failures. One of the most complex, and probably the most critical parts is the Networking.

So I set out to understand exactly how the Networking in Kubernetes works. I read the docs, watched some talks, even browsed the codebase. And here is what I found out.

Kubernetes Networking Model

At it’s core, Kubernetes Networking has one important fundamental design philosophy:

Every Pod has a unique IP.
This Pod IP is shared by all the containers in this Pod, and it’s routable from all the other Pods. Ever notice some “pause” containers running on your Kubernetes nodes? They are called “sandbox containers”, whose only job is to reserve and hold a network namespace (netns) which is shared by all the containers in a pod. This way, a pod IP doesn’t change even if a container dies and a new one in created in it’s place. A huge benefit of this IP-per-pod model is there are no IP or port collisions with the underlying host. And we don’t have to worry about what port the applications use.

With this in place, the only requirement Kubernetes has is that these Pod IPs are routable/accessible from all the other pods, regardless of what node they’re on.

Intra-node communication

The first step is to make sure pods on the same node are able to talk to each other. The idea is then extended to communication across nodes, to the internet and so on.

On every Kubernetes node, which is a linux machine in this case, there’s a root network namespace (root as in base, not the superuser) — root netns.

The main network interface eth0is in this root netns.

Similarly, each pod has its own netns, with a virtual ethernet pair connecting it to the root netns. This is basically a pipe-pair with one end in root netns, and other in the pod netns.

We name the pod-end eth0, so the pod doesn’t know about the underlying host and thinks that it has its own root network setup. The other end is named something like vethxxx.

You may list all these interfaces on your node using ifconfig or ip a commands.

This is done for all the pods on the node. For these pods to talk to each other, a linux ethernet bridge cbr0 is used. Docker uses a similar bridge named docker0.

You may list the bridges using brctl show command.

Assume a packet is going from pod1 to pod2

1. It leaves pod1’s netns at eth0 and enters the root netns at vethxxx.

2. It’s passed on to cbr0, which discovers the destination using an ARP request, saying “who has this IP?”

3. vethyyy says it has that IP, so the bridge knows where to forward the packet.

4. The packet reaches vethyyy, crosses the pipe-pair and reaches pod2’s netns.

This is how containers on a node talk to each other. Obviously there are other ways, but this is probably the easiest, and what docker uses as well.

Inter-node communication

As I mentioned earlier, pods need to be reachable across nodes as well. Kubernetes doesn’t care how it’s done. We can use L2 (ARP across nodes), L3 (IP routing across nodes — like the cloud provider route tables), overlay networks, or even carrier pigeons. It doesn’t matter as long as the traffic can reach the desired pod on another node. Every node is assigned a unique CIDR block (a range of IP addresses) for pod IPs, so each pod has a unique IP that doesn’t conflict with pods on another node.

In most of the cases, especially in cloud environments, the cloud provider route tables make sure the packets reach the correct destination. The same thing could be accomplished by setting up correct routes on every node. There are a bunch of other network plugins that do their own thing.

Here we have two nodes, similar to what we saw earlier. Each node has various network namespaces, network interfaces and a bridge.

Assume a packet is going from pod1 to pod4 (on a different node).

  1. It leaves pod1’s netns at eth0 and enters the root netns at vethxxx.
  2. It’s passed on to cbr0, which makes the ARP request to find the destination.
  3. It comes out of cbr0 to the main network interface eth0 since nobody on this node has the IP address for pod4.
  4. It leaves the machine node1 onto the wire with src=pod1 and dst=pod4.
  5. The route table has routes setup for each of the node CIDR blocks, and it routes the packet to the node whose CIDR block contains the pod4 IP.
  6. So the packet arrives at node2 at the main network interface eth0.
  7. Now even though pod4 isn’t the IP of eth0, the packet is still forwarded to cbr0 since the nodes are configured with IP forwarding enabled.
  8. The node’s routing table is looked up for any routes matching the pod4 IP. It finds cbr0 as the destination for this node’s CIDR block.
  9. You may list the node route table using route -n command, which will show a route for cbr0 like this:

  1. The bridge takes the packet, makes an ARP request and finds out that the IP belongs to vethyyy.

  2. The packet crosses the pipe-pair and reaches pod4 🏠

An illustrated guide to Kubernetes Networking [Part 2]

We’ll expand on these ideas and see how the overlay networks work. We will also understand how the ever-changing pods are abstracted away from apps running in Kubernetes and handled behind the scenes.

Overlay networks

Overlay networks are not required by default, however, they help in specific situations. Like when we don’t have enough IP space, or network can’t handle the extra routes. Or maybe when we want some extra management features the overlays provide. One commonly seen case is when there’s a limit of how many routes the cloud provider route tables can handle. For example, AWS route tables support up to 50 routes without impacting network performance. So if we have more than 50 Kubernetes nodes, AWS route table won’t be enough. In such cases, using an overlay network helps.

It is essentially encapsulating a packet-in-packet which traverses the native network across nodes. You may not want to use an overlay network since it may cause some latency and complexity overhead due to encapsulation-decapsulation of all the packets. It’s often not needed, so we should use it only when we know why we need it.

To understand how traffic flows in an overlay network, let’s consider an example of flannel, which is an open-source project by CoreOS.

Here we see that it’s the same setup as before, but with a new virtual ethernet device called flannel0 added to root netns. It’s an implementation of Virtual Extensible LAN (VXLAN), but to linux, its just another network interface.

The flow for a packet going from pod1 to pod4 (on a different node) is something like this:

  1. The packet leaves pod1’s netns at eth0 and enters the root netns at vethxxx.

  2. It’s passed on to cbr0, which makes the ARP request to find the destination.

3a. Since nobody on this node has the IP address for pod4, bridge sends it to flannel0 because the node’s route table is configured with flannel0 as the target for the pod network range .

3b. As the flanneld daemon talks to the Kubernetes apiserver or the underlying etcd, it knows about all the pod IPs, and what nodes they’re on. So flannel creates the mappings (in userspace) for pods IPs to node IPs.

flannel0 takes this packet and wraps it in a UDP packet with extra headers changing the source and destinations IPs to the respective nodes, and sends it to a special vxlan port (generally 8472).

Even though the mapping is in userspace, the actual encapsulation and data flow happens in kernel space. So it happens pretty fast.

3c. The encapsulated packet is sent out via eth0 since it is involved in routing the node traffic.

  1. The packet leaves the node with node IPs as source and destination.

  2. The cloud provider route table already knows how to route traffic between nodes, so it send the packet to destination node2.

6a. The packet arrives at eth0 of node2. Due to the port being special vxlan port, kernel sends the packet to flannel0.

6b. flannel0 de-capsulates and emits it back in the root network namespace.

6c. Since IP forwarding is enabled, kernel forwards it to cbr0 as per the route tables.

  1. The bridge takes the packet, makes an ARP request and finds out that the IP belongs to vethyyy.

  2. The packet crosses the pipe-pair and reaches pod4 🏠

There could be slight differences among different implementations, but this is how overlay networks in Kubernetes work. There’s a common misconception that we have to use overlays when using Kubernetes. The truth is, it completely depends on the specific scenarios. So make sure you use it only when it’s absolutely needed.

An illustrated guide to Kubernetes Networking [Part 3]

Cluster dynamics

Due to the every-changing dynamic nature of Kubernetes, and distributed systems in general, the pods (and consequently their IPs) change all the time. Reasons could range from desired rolling updates and scaling events to unpredictable pod or node crashes. This makes the Pod IPs unreliable for using directly for communications.

Enter Kubernetes Services — a virtual IP with a group of Pod IPs as endpoints (identified via label selectors). These act as a virtual load balancer, whose IP stays the same while the backend Pod IPs may keep changing.

The whole virtual IP implementation is actually iptables (the recent versions have an option of using IPVS, but that’s another discussion) rules, that are managed by the Kubernetes component — kube-proxy. This name is actually misleading now. It used to work as a proxy pre-v1.0 days, which turned out to be pretty resource intensive and slower due to constant copying between kernel space and user space. Now, it’s just a controller, like many other controllers in Kubernetes, that watches the api server for endpoints changes and updates the iptables rules accordingly.

Due to these iptables rules, whenever a packet is destined for a service IP, it’s DNATed (DNAT=Destination Network Address Translation), meaning the destination IP is changed from service IP to one of the endpoints — pod IP — chosen at random by iptables. This makes sure the load is evenly distributed among the backend pods.

When this DNAT happens, this info is stored in conntrack — the Linux connection tracking table (stores 5-tuple translations iptables has done: protocol, srcIP, srcPort, dstIP, dstPort). This is so that when a reply comes back, it can un-DNAT, meaning change the source IP from the Pod IP to the Service IP. This way, the client is unaware of how the packet flow is handled behind the scenes.

So by using Kubernetes services, we can use same ports without any conflicts (since we can remap ports to endpoints). This makes service discovery super easy. We can just use the internal DNS and hard-code the service hostnames. We can even use the service host and port environment variables preset by Kubernetes.

Protip: Take this second approach and save a lot of unnecessary DNS calls!

Outbound traffic

The Kubernetes services we’ve talked about so far work within a cluster. However, in most of the practical cases, applications need to access some external api/website.

Generally, nodes can have both private and public IPs. For internet access, there is some sort of 1:1 NAT of these public and private IPs, especially in cloud environments.

For normal communication from node to some external IP, source IP is changed from node’s private IP to it’s public IP for outbound packets and reversed for reply inbound packets. However, when connection to an external IP is initiated by a Pod, the source IP is the Pod IP, which the cloud provider’s NAT mechanism doesn’t know about. It will just drop packets with source IPs other than the node IPs.

So we use, you guessed it, some more iptables! These rules, also added by kube-proxy, do the SNAT (Source Network Address Translation) aka IP MASQUERADE. This tells the kernel to use IP of the interface this packet is going out from, in place of the source Pod IP. A conntrack entry is also kept to un-SNAT the reply.

Inbound traffic

Everything’s good so far. Pods can talk to each other, and to the internet. But we’re still missing a key piece — serving the user request traffic. As of now, there are two main ways to do this:

NodePort/Cloud Loadbalancer (L4 — IP and Port) Setting the service type to NodePort assigns the service a nodePort in range 30000-33000. This nodePort is open on every node, even if there’s no pod running on a particular node. Inbound traffic on this NodePort would be sent to one of the pods (it may even be on some other node!) using, again, iptables.

A service type of LoadBalancer in cloud environments would create a cloud load balancer (ELB, for example) in front of all the nodes, hitting the same nodePort.

Ingress (L7 — HTTP/TCP)

A bunch of different implements, like nginx, traefik, haproxy, etc., keep a mapping of http hostnames/paths and the respective backends. This is entry point of the traffic over a load balancer and nodeport as usual, but the advantage is that we can have one ingress handling inbound traffic for all the services instead of requiring multiple nodePorts and load balancers.

Network Policy

Think of this like security groups/ACLs for pods. The NetworkPolicy rules allow/deny traffic across pods. The exact implementation depends on the network layer/CNI, but most of them just use iptables.

That’s all for now. In the previous parts we studied the foundation of Kubernetes Networking and how overlays work. Now we know how the Service abstraction helping in a dynamic cluster and makes discovery super easy. We also covered how the outbound and inbound traffic flow works and how network policy is useful for security within a cluster.

#kubernetes #devops #developer

What is GEEK

Buddha Community

Everything You Need to Know About Kubernetes Networking
Christa  Stehr

Christa Stehr


50+ Useful Kubernetes Tools for 2020 - Part 2


Last year, we provided a list of Kubernetes tools that proved so popular we have decided to curate another list of some useful additions for working with the platform—among which are many tools that we personally use here at Caylent. Check out the original tools list here in case you missed it.

According to a recent survey done by Stackrox, the dominance Kubernetes enjoys in the market continues to be reinforced, with 86% of respondents using it for container orchestration.

(State of Kubernetes and Container Security, 2020)

And as you can see below, more and more companies are jumping into containerization for their apps. If you’re among them, here are some tools to aid you going forward as Kubernetes continues its rapid growth.

(State of Kubernetes and Container Security, 2020)

#blog #tools #amazon elastic kubernetes service #application security #aws kms #botkube #caylent #cli #container monitoring #container orchestration tools #container security #containers #continuous delivery #continuous deployment #continuous integration #contour #developers #development #developments #draft #eksctl #firewall #gcp #github #harbor #helm #helm charts #helm-2to3 #helm-aws-secret-plugin #helm-docs #helm-operator-get-started #helm-secrets #iam #json #k-rail #k3s #k3sup #k8s #keel.sh #keycloak #kiali #kiam #klum #knative #krew #ksniff #kube #kube-prod-runtime #kube-ps1 #kube-scan #kube-state-metrics #kube2iam #kubeapps #kubebuilder #kubeconfig #kubectl #kubectl-aws-secrets #kubefwd #kubernetes #kubernetes command line tool #kubernetes configuration #kubernetes deployment #kubernetes in development #kubernetes in production #kubernetes ingress #kubernetes interfaces #kubernetes monitoring #kubernetes networking #kubernetes observability #kubernetes plugins #kubernetes secrets #kubernetes security #kubernetes security best practices #kubernetes security vendors #kubernetes service discovery #kubernetic #kubesec #kubeterminal #kubeval #kudo #kuma #microsoft azure key vault #mozilla sops #octant #octarine #open source #palo alto kubernetes security #permission-manager #pgp #rafay #rakess #rancher #rook #secrets operations #serverless function #service mesh #shell-operator #snyk #snyk container #sonobuoy #strongdm #tcpdump #tenkai #testing #tigera #tilt #vert.x #wireshark #yaml

Maud  Rosenbaum

Maud Rosenbaum


Kubernetes in the Cloud: Strategies for Effective Multi Cloud Implementations

Kubernetes is a highly popular container orchestration platform. Multi cloud is a strategy that leverages cloud resources from multiple vendors. Multi cloud strategies have become popular because they help prevent vendor lock-in and enable you to leverage a wide variety of cloud resources. However, multi cloud ecosystems are notoriously difficult to configure and maintain.

This article explains how you can leverage Kubernetes to reduce multi cloud complexities and improve stability, scalability, and velocity.

Kubernetes: Your Multi Cloud Strategy

Maintaining standardized application deployments becomes more challenging as your number of applications and the technologies they are based on increase. As environments, operating systems, and dependencies differ, management and operations require more effort and extensive documentation.

In the past, teams tried to get around these difficulties by creating isolated projects in the data center. Each project, including its configurations and requirements were managed independently. This required accurately predicting performance and the number of users before deployment and taking down applications to update operating systems or applications. There were many chances for error.

Kubernetes can provide an alternative to the old method, enabling teams to deploy applications independent of the environment in containers. This eliminates the need to create resource partitions and enables teams to operate infrastructure as a unified whole.

In particular, Kubernetes makes it easier to deploy a multi cloud strategy since it enables you to abstract away service differences. With Kubernetes deployments you can work from a consistent platform and optimize services and applications according to your business needs.

The Compelling Attributes of Multi Cloud Kubernetes

Multi cloud Kubernetes can provide multiple benefits beyond a single cloud deployment. Below are some of the most notable advantages.


In addition to the built-in scalability, fault tolerance, and auto-healing features of Kubernetes, multi cloud deployments can provide service redundancy. For example, you can mirror applications or split microservices across vendors. This reduces the risk of a vendor-related outage and enables you to create failovers.

#kubernetes #multicloud-strategy #kubernetes-cluster #kubernetes-top-story #kubernetes-cluster-install #kubernetes-explained #kubernetes-infrastructure #cloud

Mitchel  Carter

Mitchel Carter


Microsoft Announces General Availability Of Bridge To Kubernetes

Recently, Microsoft announced the general availability of Bridge to Kubernetes, formerly known as Local Process with Kubernetes. It is an iterative development tool offered in Visual Studio and VS Code, which allows developers to write, test as well as debug microservice code on their development workstations while consuming dependencies and inheriting the existing configuration from a Kubernetes environment.

Nick Greenfield, Program Manager, Bridge to Kubernetes stated in an official blog post, “Bridge to Kubernetes is expanding support to any Kubernetes. Whether you’re connecting to your development cluster running in the cloud, or to your local Kubernetes cluster, Bridge to Kubernetes is available for your end-to-end debugging scenarios.”

Bridge to Kubernetes provides a number of compelling features. Some of them are mentioned below-

#news #bridge to kubernetes #developer tools #kubernetes #kubernetes platform #kubernetes tools #local process with kubernetes #microsoft

Houston  Sipes

Houston Sipes


Did Google Open Sourcing Kubernetes Backfired?

Over the last few years, Kubernetes have become the de-facto standard for container orchestration and has also won the race against Docker for being the most loved platforms among developers. Released in 2014, Kubernetes has come a long way with currently being used across the entire cloudscape platforms. In fact, recent reports state that out of 109 tools to manage containers, 89% of them are leveraging Kubernetes versions.

Although inspired by Borg, Kubernetes, is an open-source project by Google, and has been donated to a vendor-neutral firm — The Cloud Native Computing Foundation. This could be attributed to Google’s vision of creating a platform that can be used by every firm of the world, including the large tech companies and can host multiple cloud platforms and data centres. The entire reason for handing over the control to CNCF is to develop the platform in the best interest of its users without vendor lock-in.

#opinions #google open source #google open source tools #google opening kubernetes #kubernetes #kubernetes platform #kubernetes tools #open source kubernetes backfired

Marlon  Boyle

Marlon Boyle


Autonomous Driving Network (ADN) On Its Way

Talking about inspiration in the networking industry, nothing more than Autonomous Driving Network (ADN). You may hear about this and wondering what this is about, and does it have anything to do with autonomous driving vehicles? Your guess is right; the ADN concept is derived from or inspired by the rapid development of the autonomous driving car in recent years.

Image for post

Driverless Car of the Future, the advertisement for “America’s Electric Light and Power Companies,” Saturday Evening Post, the 1950s.

The vision of autonomous driving has been around for more than 70 years. But engineers continuously make attempts to achieve the idea without too much success. The concept stayed as a fiction for a long time. In 2004, the US Defense Advanced Research Projects Administration (DARPA) organized the Grand Challenge for autonomous vehicles for teams to compete for the grand prize of $1 million. I remembered watching TV and saw those competing vehicles, behaved like driven by drunk man, had a really tough time to drive by itself. I thought that autonomous driving vision would still have a long way to go. To my surprise, the next year, 2005, Stanford University’s vehicles autonomously drove 131 miles in California’s Mojave desert without a scratch and took the $1 million Grand Challenge prize. How was that possible? Later I learned that the secret ingredient to make this possible was using the latest ML (Machine Learning) enabled AI (Artificial Intelligent ) technology.

Since then, AI technologies advanced rapidly and been implemented in all verticals. Around the 2016 time frame, the concept of Autonomous Driving Network started to emerge by combining AI and network to achieve network operational autonomy. The automation concept is nothing new in the networking industry; network operations are continually being automated here and there. But this time, ADN is beyond automating mundane tasks; it reaches a whole new level. With the help of AI technologies and other critical ingredients advancement like SDN (Software Defined Network), autonomous networking has a great chance from a vision to future reality.

In this article, we will examine some critical components of the ADN, current landscape, and factors that are important for ADN to be a success.

The Vision

At the current stage, there are different terminologies to describe ADN vision by various organizations.
Image for post

Even though slightly different terminologies, the industry is moving towards some common terms and consensus called autonomous networks, e.g. TMF, ETSI, ITU-T, GSMA. The core vision includes business and network aspects. The autonomous network delivers the “hyper-loop” from business requirements all the way to network and device layers.

On the network layer, it contains the below critical aspects:

  • Intent-Driven: Understand the operator’s business intent and automatically translate it into necessary network operations. The operation can be a one-time operation like disconnect a connection service or continuous operations like maintaining a specified SLA (Service Level Agreement) at the all-time.
  • **Self-Discover: **Automatically discover hardware/software changes in the network and populate the changes to the necessary subsystems to maintain always-sync state.
  • **Self-Config/Self-Organize: **Whenever network changes happen, automatically configure corresponding hardware/software parameters such that the network is at the pre-defined target states.
  • **Self-Monitor: **Constantly monitor networks/services operation states and health conditions automatically.
  • Auto-Detect: Detect network faults, abnormalities, and intrusions automatically.
  • **Self-Diagnose: **Automatically conduct an inference process to figure out the root causes of issues.
  • **Self-Healing: **Automatically take necessary actions to address issues and bring the networks/services back to the desired state.
  • **Self-Report: **Automatically communicate with its environment and exchange necessary information.
  • Automated common operational scenarios: Automatically perform operations like network planning, customer and service onboarding, network change management.

On top of those, these capabilities need to be across multiple services, multiple domains, and the entire lifecycle(TMF, 2019).

No doubt, this is the most ambitious goal that the networking industry has ever aimed at. It has been described as the “end-state” and“ultimate goal” of networking evolution. This is not just a vision on PPT, the networking industry already on the move toward the goal.

David Wang, Huawei’s Executive Director of the Board and President of Products & Solutions, said in his 2018 Ultra-Broadband Forum(UBBF) keynote speech. (David W. 2018):

“In a fully connected and intelligent era, autonomous driving is becoming a reality. Industries like automotive, aerospace, and manufacturing are modernizing and renewing themselves by introducing autonomous technologies. However, the telecom sector is facing a major structural problem: Networks are growing year by year, but OPEX is growing faster than revenue. What’s more, it takes 100 times more effort for telecom operators to maintain their networks than OTT players. Therefore, it’s imperative that telecom operators build autonomous driving networks.”

Juniper CEO Rami Rahim said in his keynote at the company’s virtual AI event: (CRN, 2020)

“The goal now is a self-driving network. The call to action is to embrace the change. We can all benefit from putting more time into higher-layer activities, like keeping distributors out of the business. The future, I truly believe, is about getting the network out of the way. It is time for the infrastructure to take a back seat to the self-driving network.”

Is This Vision Achievable?

If you asked me this question 15 years ago, my answer would be “no chance” as I could not imagine an autonomous driving vehicle was possible then. But now, the vision is not far-fetch anymore not only because of ML/AI technology rapid advancement but other key building blocks are made significant progress, just name a few key building blocks:

  • software-defined networking (SDN) control
  • industry-standard models and open APIs
  • Real-time analytics/telemetry
  • big data processing
  • cross-domain orchestration
  • programmable infrastructure
  • cloud-native virtualized network functions (VNF)
  • DevOps agile development process
  • everything-as-service design paradigm
  • intelligent process automation
  • edge computing
  • cloud infrastructure
  • programing paradigm suitable for building an autonomous system . i.e., teleo-reactive programs, which is a set of reactive rules that continuously sense the environment and trigger actions whose continuous execution eventually leads the system to satisfy a goal. (Nils Nilsson, 1996)
  • open-source solutions

#network-automation #autonomous-network #ai-in-network #self-driving-network #neural-networks