In the last few years, Prometheus has gained huge popularity as a tool for monitoring distributed systems. It has a simple yet powerful data model and query language, however, it can often pose a bit of a challenge when it comes to high availability as well as for historical metric data storage. Adding more Prometheus replicas can be used to improve availability, but otherwise, Prometheus does not offer continuous availability. For example, if one of the Prometheus replicas crashes, there will be a gap in the metric data during the time it takes to failover to another Prometheus instance. Similarly, Prometheus’s local storage is limited in scalability and durability given its single-node architecture. You will have to rely on a remote storage system to solve the long-term data retention problem. This is where the CNCF sandbox project Thanos comes in handy.

Thanos is a set of components that can be composed into a highly available metrics system with unlimited storage capacity on GCP, S3, or other supported object stores, and runs seamlessly on top of existing Prometheus deployments. Thanos allows you to query multiple Prometheus instances at once and merges data for the same metric across multiple instances on the fly to produce a continuous stream of monitoring logs. Even though Thanos is an early-stage project, it is already used in production by companies like Adobe and eBay.

thanos highly available metrics system yugabytedb distributed sql

Because YugabyteDB is a cloud native, distributed SQL database, it can easily interoperate with Thanos and many other CNCF projects like LonghornOpenEBSRook, and Falco.

What’s YugabyteDB?_ It is an open source, high-performance distributed SQL database built on a scalable and fault-tolerant design inspired by Google Spanner. Yugabyte’s SQL API (YSQL) is PostgreSQL wire compatible._

In this blog post we’ll show you how to get up and running with Thanos so that it can be used to monitor a YugabyteDB cluster, all running on Google Kubernetes Engine (GKE).

Thanos Architecture

At a high level, Thanos has several key components worth understanding how they work.

  • First, a sidecar is deployed alongside the Prometheus container and interacts with Prometheus.
  • Next, an additional service called Thanos Query is deployed. It is configured to be aware of all instances of the Thanos Sidecar. Instead of querying Prometheus directly, you query the Thanos Query component.
  • Thanos Query communicates with the Thanos Sidecar via gRPC and de-duplicates metrics across all instances of Prometheus when executing a query. Thanos Query also delivers a graphical user interface for querying and administration, plus exposes the Prometheus API.

An illustration of the components is shown below. You can learn more about the Thanos architecture by checking out the documentation.

thanos components yugabytedb how to thanos prometheus high availability metrics

#databases #distributed sql #google cloud platform #how to #kubernetes

Highly Available Prometheus Metrics for Distributed SQL with Thanos on GKE
4.60 GEEK