In this blog post, we’ll explain the Kubernetes Operator and discuss how it can evolve your SRE solution. Kubernetes is an open-source project that “containerizes” workloads and services and manages deployment and configurations. Released by Google in 2015, Kubernetes is now maintained by the Cloud Native Computing Foundation.
Kubernetes is an open-source project that “containerizes” workloads and services and manages deployment and configurations. Released by Google in 2015, Kubernetes is now maintained by the Cloud Native Computing Foundation. Since its release, it has become a worldwide phenomenon. The majority of cloud-native companies use it, SaaS vendors offer commercial prebuilt versions, and there’s even an annual convention!
What has made Kubernetes become such a fundamental service? A major factor is its automation capabilities. Kubernetes can automatically make changes to the configuration of deployed containers or even deploy new containers based on metrics it tracks or requests made by engineers. Having Kubernetes handle these processes saves time, eliminates toil, and increases consistency.
If these benefits sound familiar, it might be because they overlap with the philosophies of SRE. But how do you incorporate the automation of Kubernetes into your SRE practices? In this blog post, we’ll explain the Kubernetes Operator—the Kubernetes function at the heart of customized automation—and discuss how it can evolve your SRE solution.
In Kubernetes Operators: Automating the Container Orchestration Platform, authors Jason Dobies and Joshua Wood describe an Operator as “an automated Site Reliability Engineer for its application.” Given an SRE’s multifaceted experience and diverse workload, this is a bold statement. So what exactly can the Operator do?
Our original Kubernetes tool list was so popular that we've curated another great list of tools to help you improve your functionality with the platform.
In this blog post, we’ll break down reliability in terms of other metrics within reliability engineering: availability and maintainability.
In this blog post, we’ll walk you through holistic measures and best practices for understanding the operational health of your systems.
Let's break these signs down together. Your Product Is Becoming a Utility. Your Users Are Demanding Reliability Over New Features. New Contracts Have Tighter SLAs (B2B) / Customers Are Getting Less Patient (B2C) Spaghetti Code Is Now Easier To Refactor Than To Fix.
In this article, we’ll look at how SRE can improve NOC functions such as system monitoring, triage and escalation, incident response procedure, and ticketing.