This project monitors availability and tracks SLA/SLO targets through built-in dashboards and Prometheus metrics. Check out this Science Fair submission!
When applications go in production one of our main concerns is to ensure that they are properly monitored, notably with appropriate checks and suitable metrics to report about their availability over time. This article comes to cope with this concern. Particularly focused on applications running on Kubernetes, it sets up a standard to monitor, measure, and observe the availability of applications. The goal being to help organizations to define Service Level Objectives (SLO) and/or Service Level Agreements (SLA) while be able to track them through factual KPIs.
This article is structured around two main sections. This first one is conceptual, introducing our fundamentals and assumptions to define, monitor, and measure application availability on Kubernetes. The second section is practical, demonstrating an implementation powered by RealOpInsight — our open source implementation of the introduced concepts. In a nutshell, RealOpInsight is application operations monitoring framework designed to work atop of Kubernetes by leveraging its basic probe capabilities against pods and containers.
Side note on the Project: This works raised from a spare time project started a few years ago with the intend to fill in lacks of efficient operations monitoring capabilities identified from traditional open source IT monitoring tools such as Nagios and Zabbix. Since then, the paradigm of IT monitoring has evolved significantly, notably due to the emerging of micro-service architectures. Therefore, the project efforts have been refocused on applications running on this new ecosystem. RealOpInsight code base has been almost completely rewritten and the deployment rethought to be fully cloud-native ready. That said, nothing is perfect, so everything improvement feedback is welcomed.
Given one or more instances of Kubernetes, our goal is to be able to monitor, measure, and track the availability of applications as established by the following tenets:
(1440 — 2)/1444 = 99.9986%
. With this foundation, we open room to easily to track the operations of Kubernetes applications over time.Diagram 1: Conceptual view of Kubernetes applications availability monitoring _(a) _ Sample of a namespace-scoped application dependency tree per _(b) _ Sample of high-level dashboard showing the current consolidated status of a set of applications.
Keeping the above tenets in mind, still with the objective to be able to define and track SLA/SLOs targets for Kubernetes applications, the second part of this article aims at demonstrating how that works. Thanks to an implementation, namely RealOpInsight, w e’ll present the architecture building blocks, demonstrate a quick deployment on a Kubernetes cluster. The deployment will be followed by a quick demo. We’ll demonstrate built-in dashboards and show how it’s possible to extend those dashboards using 3-party data visualization tools. This extended visualization can be online leveraging the RealOpInsight’s built-in Prometheus exporter (e.g. with Grafana), or offline by leveraging its capabilities to export data in CSV.
#kubernetes #monitoring #microservice architecture #grafana #prometheus