Stage: Alpha
Feature group: node
Ephemeral containers are a great way to debug running pods, as you can’t add regular containers to a pod after creation (you should use sysdig tools like kubectl capture or kubectl trace for that though!), but you can run ephemeral containers.
Right now the steps to run an ephemeral container aren’t straightforward. Once this feature is stable you may be able to run them with just kubectl debug
:
kubectl debug -c debug-shell --image=debian target-pod -- bash
These containers executes within the namespace of an existing pod and has access to the file systems of its individual containers.
Ephemeral containers aren’t meant to be used for regular deployments, so they have some limitations. For example, they will never be automatically restarted and you can’t configure them as a regular container. In particular, fields like ports
, livenessProbe
, readinessProbe
or lifecycle
that imply a role in a pod will be disallowed.
Stage: Alpha
Feature group: network
As the use of IPv6 increases it’s getting more common to manage clusters with mixed IPv4 and IPv6 network configurations.
Up until now a Kubernetes cluster could only run in either IPv4 or IPv6 mode. You needed the assistance of plugins to assign dual-stack addresses on a pod, and it wasn’t a convenient solution, as Kubernetes would only be aware of one address per pod.
Now you can natively run your cluster in dual-stack mode. For example, you can have dual-stack pods (services still need to be either IPv4 or IPv6).
To use dual-stack you need to enable the feature gate IPv6DualStack
in the relevant components of your cluster, and then setup your services. You can get the full steps here here.
Stage: Alpha
Feature group: network
Until now, all endpoints for a service were stored in one single object. In large Services with many pods, this Endpoints
object may grow too big and become problematic; as big objects cannot be stored in etcd,
and also aren’t propagated to kube-proxy(s).
In addition, everytime there is a change in an endpoint the whole Endpoints
object is re-computed, stored and shared with all watchers. This process doesn’t scale too well and can become a bottleneck in scenarios like rolling upgrades, where there is a burst of endpoint changes.
The new EndpointSlice API will split endpoints into several Endpoint Slice resources, solving many of the current API problems. It’s also designed to support other future features, like multiple IPs per pod.
Stage: Alpha
Feature group: node
In addition to the requested resources, your pods needs some extra resources just to maintain their runtime environment.
With PodOverhead
feature gate enabled, Kubernetes will take into account this overhead when scheduling a pod. The Pod Overhead is calculated and fixed at admission time and it’s associated with the pod’s RuntimeClass, get the full details here.
Stage: Alpha
Feature group: scheduling
One of the challenges of running a multi-zone cluster is to spread your pods evenly, so high availability will work correctly and the resource utilization will be efficient.
With topologySpreadConstraints
you can distribute your pods across zones, with a maximum difference in pod count number of maxSkew
. Zones are created by grouping nodes with the same topologyKey
label.
If we want to deploy this pod:
apiVersion: v1 kind: Pod metadata: name: mypod … spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: foo: bar …
In a cluster with this topology:
Label +---------------+---------------+ zone= | zoneA | zoneB | +-------+-------+-------+-------+ node= | node1 | node2 | node3 | node4 | +-------+-------+-------+-------+ foo:bar | P | P | P _ | _ | +-------+-------+-------+-------+
The only way to comply with the topology constraints is for the pod to be deployed in node3
or in node4
.
Stage: Alpha
Feature group: node
Probes allows Kubernetes to monitor the status of your applications. You can use livenessProbe
to periodically check if the application is still alive. One example container defines this probe:
livenessProbe: httpGet: path: /healthz port: liveness-port failureThreshold: 3 periodSeconds: 10
If it fails 3 times in 30s the container will be restarted. But as this container is slow and needs more than 30 seconds to start, the probe will fail and the container will be restarted again.
This new feature lets you define a startupProbe
that will hold off all the other probes until the pod finishes its startup:
startupProbe: httpGet: path: /healthz port: liveness-port failureThreshold: 30 periodSeconds: 10
Now our slow container has up to 5 minutes (30 checks * 10 seconds = 300s) to finish its startup.
Stage: Alpha
Feature group: scheduling
The RequestedToCapacityRatioPriority
function allows to schedule pods depending on the relative usage of each node. That way you can choose whether to schedule pods in the less used nodes, or to fill the ones that are already in use.
The new resources
property lets you further define the relative usage of a node. By assigning weights to the node resources you can define scenarios like “CPU usage is 3 times more important than used memory”, then schedule more pods in nodes with idle CPUs even if they don’t have that much free memory.
{ "kind" : "Policy", "apiVersion" : "v1", … "priorities" : [ … { "name": "RequestedToCapacityRatioPriority", "weight": 2, "argument": { "requestedToCapacityRatioArguments": { "shape": [ {"utilization": 0, "score": 0}, {"utilization": 100, "score": 10} ], "resources": [ {"name": "intel.com/foo", "weight": 5}, {"name": "CPU", "weight": 3}, {"name": "Memory", "weight": 1} ] } } } ], }
Stage: Graduating to Beta
Feature group: node
The initial RuntimeClass
implementation was meant for homogeneous clusters, where every node supports every RuntimeClass
.
This upgrade improves scheduling in heterogeneous clusters, with specialized nodes that only support a subset of the runtime classes.
In these clusters, pods are now automatically scheduled only to the nodes that have support for their RuntimeClass.
Stage: Alpha
Feature group: cluster-lifecycle
Support for windows nodes was introduced in Kubernetes 1.14, however there wasn’t an easy way to join windows nodes to a cluster.
Starting in Kubernetes 1.16, kubeadm join
will be available for Windows users with partial functionality. It will lack some features like kubeadm init
or kubeadm join --control-plane
.
Stage: Alpha
Feature group: windows
Now that Kubernetes has support for Group Managed Service Accounts we can use the runAsUserName
Windows specific property to define which user will run a container’s entrypoint.
The property is inside the PodSecurityContext
and SecurityContext
structs, and it needs to follow the format DOMAIN\USER
, where the domain part is optional.
apiVersion: v1 kind: Pod … spec: securityContext: windowsOptions: runAsUserName: "NT AUTHORITY\\NETWORK SERVICE"
Stage: Graduating to Beta
Feature group: windows
This will allow an operator to choose a GMSA at deployment time, and run containers using it to connect to existing applications such as a database or API server without changing how the authentication and authorization are managed inside the organization.
Stage: Graduating to Stable
Feature group: API
Until now mutating webhooks were only called once, in alphabetical order. In Kubernetes 1.15 this changed, allowing webhook re-invocation if another webhook later in the chain modifies the same object.
Stage: Graduating to Beta
Feature group: API
The “bookmark“ watch event is used as a checkpoint, indicating that all objects up to a given resourceVersion requested by the client have already been sent. The API can skip sending all these events, avoiding unnecessary processing on both sides.
Stage: Alpha
Feature group: node
Machine learning, scientific computing and financial services are examples of systems that are computational intensive or require ultra low latency, this kinds of workloads benefits from proper resource allocation.
For example, performance is improved if a process runs on one isolated CPU core rather than jumping between cores or sharing time with other processes. Parallel processes also run better on cores inside the same CPU socket (in multi socket systems).
The node topology manager is a kubelet
component that centralizes the coordination of hardware resource assignments. Currently this task is done by independent components (CPU manager, device manager, CNI), which sometimes ends up on unoptimized allocations.
Only pods running in Guaranteed
QoS class that have an integer cpu
value are considered by the Topology Manager, like the one in this example:
… spec: containers: - name: nginx image: nginx resources: limits: memory: "200Mi" cpu: "2" example.com/device: "1" requests: memory: "200Mi" cpu: "2" example.com/device: "1"
Stage: Alpha
Feature group: cluster-lifecycle
kubeadm
works great to configure most of the Kubernetes clusters, but it has some limitations and some advanced use cases requires extra tools.
With Kustomize you can patch base configurations to obtain configuration variants, which helps to manage some advanced scenarios. For example, you can have a base configuration for your service, then patch it with different limits for each of your dev, test and prod environments.
Now kubeadm
integrates with Kustomize. When passing patches via the --experimental-kustomize
flag, kubeadm
will first apply those patches to the existing configuration, then proceed as usual with the patched config.
kubeadm init --experimental-kustomize kubeadm-patches/
The flag will be renamed to just -kustomize
when this feature reaches beta. Learn more and check other examples here.
Stage: Graduating to Beta
Feature group: API machinery
This feature aims to move the logic away from kubectl apply
to the apiserver, fixing most of the current workflow pitfalls and also making the operation accessible directly from the API (for example using curl
), without strictly requiring kubectl
or a Golang implementation.
Stage: Graduating to Beta
Feature group: network
There are various corner cases where cloud resources are orphaned after the associated Service is deleted. Finalizer Protection for Service LoadBalancers was introduced to prevent this from happening.
Stage: Graduating to Stable
Feature group: azure
Nodes in Azure will be added with label failure-domain.beta.kubernetes.io/zone=-
and topology-aware provisioning is added for Azure managed disks storage class.
Stage: Graduating to Stable
Feature group: azure
Cross resource group (RG) nodes and unmanaged (such as on-prem) nodes in Azure cloud provider are now supported.
Stage: Alpha
Feature group: storage
Container Storage Interface plugins were created to allow the development of third party storage volume systems.
Starting with Kubernetes 1.16, Windows nodes will be able to use the existing CSI plugins.
Stage: Graduating to Beta
Feature group: storage
Using this feature, you can “clone” an existing PV. A Clone results in a new, duplicate volume being provisioned from an existing volume.
Stage: Graduating to Beta
Feature group: storage
To support resizing of CSI volumes an external resize controller will monitor all PVCs. If a PVC meets following criteria for resizing, it will be added to controller’s workqueue.
Stage: Graduating to Beta
Feature group: storage
CSI volumes can only be referenced via PV/PVC today. This works well for remote persistent volumes. This feature introduces the possibility to use CSI volumes as local ephemeral volumes as well.
Stage: Graduating to Stable
Feature group: API
This feature groups the many modifications and improvements that were performed to graduate CustomResourceDefinitions to Stable in the Kubernetes 1.16 release
Stage: Graduating to Stable
Feature group: API
With this feature you can enable the Status
and Scale
subresources for Custom resources.
By adding the comment // +kubebuilder:subresource:status
in your CDR definition you will be enabling the /status
subresource, which exposes the current status in the system of your custom resource.
// MySQL is the Schema for the mysqls API // +k8s:openapi-gen=true // +kubebuilder:subresource:status type MySQL struct { metav1.TypeMeta `json:",inline"` metav1.ObjectMeta `json:"metadata,omitempty"` Spec MySQLSpec `json:"spec,omitempty"` Status MySQLStatus `json:"status,omitempty"` }
By enabling the Scale
subresource, you’ll be able to check how many replicas of your subresource are deployed vs the desired amount. You can obtain this information from the exposed /scale
subresource or executing kubectl get deployments
. You can also use kubectl scale
to adjust the number of replicas of your custom resource.
To enable the Scale subresource you need to define the corresponding JSONPaths in the CDR:
apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition … spec: subresources: status: {} scale: specReplicasPath: .spec.replicas statusReplicasPath: .status.replicas labelSelectorPath: .status.labelSelector
Stage: Graduating to Stable
Feature group: API
Two features aiming to facilitate the JSON handling and processing associated with CustomResourceDefinitions.
Stage: Graduating to Stable
Feature group: API
Different CRD versions can have different schemas. You can now handle on-the-fly conversion between versions defining and implementing a conversion webhook.
Stage: Graduating to Stable
Feature group: API
CustomResourceDefinition (CRD) allows the CRD author to define an OpenAPI v3 schema to enable server-side validation for CustomResources (CR).
Stage: Alpha
Feature group: API
The field SelfLink
is present in every Kubernetes object and contains a URL representing the given object.
This field does not provide any new information and its creation and maintenance has a performance impact, so a decision has been taken to progressly deprecate SelfLink
by Kubernetes 1.21
Stage: Alpha
Feature group: cloud-provider
Specific code for cloud providers is being moved away from the core Kubernetes repository (in-tree) to their own external repositories (out-of-tree). By doing so, cloud providers will be able to develop and make releases independent from the core Kubernetes release cycle.
In this halfway moment cloud providers are being copied out-of-tree but they are still available in-tree, so developers may end up with two versions of the same cloud provider in their builds. How do you know which one of the two versions is active?
With this alpha feature you can disable in-tree cloud providers to ensure your build is only using the external version.
Stage: Alpha
Feature group: instrumentation
This feature summarizes several tasks needed to align Kubernetes metrics with their Instrumentation Guidelines. Main tasks are changing the names and units of some metrics to be in line with the rest of the Prometheus ecosystem.
Originally published at https://sysdig.com
#kubernetes #devops