One of the challenges with distributed systems and microservices architecture is automatically detecting unhealthy applications, rerouting requests to other available systems, and restoring the broken components. Health checks are one way to address this challenge and ensure reliability. With Kubernetes, health checks are configured via probes to determine the state of each pod.

By default, Kubernetes simply observes the pod’s lifecycle and starts to route traffic to the pod when the containers move from the Pending to Succeeded state. Kubelet also watches for application crashes and restarts the pod to recover. Many developers assume that this basic setup is adequate, especially when the application inside the pod is configured with daemon process managers (e.g. PM2 for Node.js). However, since Kubernetes deems a pod as healthy and ready for requests as soon as all the containers start, the application may receive traffic before it is actually ready. This may happen if the application needs to initialize some state, make database connections, or load data before handling application logic. This gap in time between when the application is actually ready versus when Kubernetes thinks is ready becomes an issue when the deployment begins to scale and unready applications receive traffic and send back 500 errors.

This is where Kubernetes probes come in to define when a container is ready to accept traffic and when a container should be restarted. As of Kubernetes 1.16, there are now three types of probes supported. In this post, we’ll review the different types of probes, best practices, and tools to detect deployments with potential configuration issues.

Kubernetes Probes

Kubernetes supports readiness and liveness probes for versions ≤ 1.15. Startup probes were added in 1.16 as an alpha feature and graduated to beta in 1.18 (WARNING: 1.16 deprecated several Kubernetes APIs. Use this migration guide to check for compatibility).

All the probe have the following parameters:

  • initialDelaySeconds : number of seconds to wait before initiating liveness or readiness probes
  • periodSeconds: how often to check the probe
  • timeoutSeconds: number of seconds before marking the probe as timing out (failing the health check)
  • successThreshold : minimum number of consecutive successful checks for the probe to pass
  • failureThreshold : number of retries before marking the probe as failed. For liveness probes, this will lead to the pod restarting. For readiness probes, this will mark the pod as unready.

#kubernetes #kubernetes probes

Understanding Kubernetes Probes
3.20 GEEK