Using the Prometheus Operator has become a common choice when it comes to running Prometheus in a Kubernetes cluster. It can manage Prometheus and Alertmanager for us with the help of CRDs in Kubernetes. The Kube-Prometheus-stack Helm chart (formerly known as Prometheus-operator) comes with Grafana, node_exporter, and more out of the box.
In a previous blog post about Prometheus, we took a look at setting up Prometheus and Grafana using manifest files. We also explored a few of the metrics exposed by YugabyteDB. In this post, we will be setting up Prometheus and Grafana using the Kube-Prometheus-stack chart. And we will configure Prometheus to scrape YugabyteDB pods. In the end, we will take a look at the YugabyteDB Grafana dashboard that can be used to visualize all the collected metrics.
Before we get started with the setup, make sure you have a Kubernetes 1.16+ cluster with kubectl pointing to it. You can create a GKE cluster or an equivalent in other cloud providers or use Minikube to create a local cluster.
We will be using Helm 3 to install the charts, make sure you have it installed on your machine as well.
#cloud #tutorial #kubernetes #cloud native #grafana #prometheus #monitoring and alerting #yugabyte
Last year, we provided a list of Kubernetes tools that proved so popular we have decided to curate another list of some useful additions for working with the platform—among which are many tools that we personally use here at Caylent. Check out the original tools list here in case you missed it.
According to a recent survey done by Stackrox, the dominance Kubernetes enjoys in the market continues to be reinforced, with 86% of respondents using it for container orchestration.
And as you can see below, more and more companies are jumping into containerization for their apps. If you’re among them, here are some tools to aid you going forward as Kubernetes continues its rapid growth.
#blog #tools #amazon elastic kubernetes service #application security #aws kms #botkube #caylent #cli #container monitoring #container orchestration tools #container security #containers #continuous delivery #continuous deployment #continuous integration #contour #developers #development #developments #draft #eksctl #firewall #gcp #github #harbor #helm #helm charts #helm-2to3 #helm-aws-secret-plugin #helm-docs #helm-operator-get-started #helm-secrets #iam #json #k-rail #k3s #k3sup #k8s #keel.sh #keycloak #kiali #kiam #klum #knative #krew #ksniff #kube #kube-prod-runtime #kube-ps1 #kube-scan #kube-state-metrics #kube2iam #kubeapps #kubebuilder #kubeconfig #kubectl #kubectl-aws-secrets #kubefwd #kubernetes #kubernetes command line tool #kubernetes configuration #kubernetes deployment #kubernetes in development #kubernetes in production #kubernetes ingress #kubernetes interfaces #kubernetes monitoring #kubernetes networking #kubernetes observability #kubernetes plugins #kubernetes secrets #kubernetes security #kubernetes security best practices #kubernetes security vendors #kubernetes service discovery #kubernetic #kubesec #kubeterminal #kubeval #kudo #kuma #microsoft azure key vault #mozilla sops #octant #octarine #open source #palo alto kubernetes security #permission-manager #pgp #rafay #rakess #rancher #rook #secrets operations #serverless function #service mesh #shell-operator #snyk #snyk container #sonobuoy #strongdm #tcpdump #tenkai #testing #tigera #tilt #vert.x #wireshark #yaml
Kubernetes is one of the most popular choices for container management and automation today. A highly efficient Kubernetes setup generates innumerable new metrics every day, making monitoring cluster health quite challenging. You might find yourself sifting through several different metrics without being entirely sure which ones are the most insightful and warrant utmost attention.
As daunting a task as this may seem, you can hit the ground running by knowing which of these metrics provide the right kind of insights into the health of your Kubernetes clusters. Although there are observability platforms to help you monitor your Kubernetes clusters’ right metrics, knowing exactly which ones to watch will help you stay on top of your monitoring needs. In this article, we take you through a few Kubernetes health metrics that top our list.
A crash loop is the last thing you’d want to go undetected. During a crash loop, your application breaks down as a pod starts and keeps crashing and restarting in a circle. Multiple reasons can lead to a crash loop, making it tricky to identify the root cause. Being alerted when a crash loop occurs can help you quickly narrow down the list of causes and take emergency measures to keep your application active.
#devops #kubernetes #monitoring #observability #kubernetes health monitoring #monitoring for kubernetes
Over the past two years at Magalix, we have focused on building our system, introducing new features, and scaling our infrastructure and microservices. During this time, we had a look at our Kubernetes clusters utilization and found it to be very low. We were paying for resources we didn’t use, so we started a cost-saving practice to increase cluster utilization, use the resources we already had and pay less to run our cluster.
In this article, I will discuss the top five techniques we used to better utilize our Kubernetes clusters on the cloud and eliminate wasted resources, thus saving money. In the end, we were able to cut our monthly bill by more than 50%!
#cloud-native #kubernetes #optimization #kubecost #kubernetes-cost-savings #kubernetes-cost-monitoring #kubernetes-reduce-cost #kubernetes-cost-analysis
Many enterprises and SaaS companies depend on a variety of external API integrations in order to build an awesome customer experience. Some integrations may outsource certain business functionality such as handling payments or search to companies like Stripe and Algolia. You may have integrated other partners which expand the functionality of your product offering, For example, if you want to add real-time alerts to an analytics tool, you might want to integrate the PagerDuty and Slack APIs into your application.
If you’re like most companies though, you’ll soon realize you’re integrating hundreds of different vendors and partners into your app. Any one of them could have performance or functional issues impacting your customer experience. Worst yet, the reliability of an integration may be less visible than your own APIs and backend. If the login functionality is broken, you’ll have many customers complaining they cannot log into your website. However, if your Slack integration is broken, only the customers who added Slack to their account will be impacted. On top of that, since the integration is asynchronous, your customers may not realize the integration is broken until after a few days when they haven’t received any alerts for some time.
How do you ensure your API integrations are reliable and high performing? After all, if you’re selling a feature real-time alerting, you’re alerts better well be real-time and have at least once guaranteed delivery. Dropping alerts because your Slack or PagerDuty integration is unacceptable from a customer experience perspective.
Specific API integrations that have an exceedingly high latency could be a signal that your integration is about to fail. Maybe your pagination scheme is incorrect or the vendor has not indexed your data in the best way for you to efficiently query.
Average latency only tells you half the story. An API that consistently takes one second to complete is usually better than an API with high variance. For example if an API only takes 30 milliseconds on average, but 1 out of 10 API calls take up to five seconds, then you have high variance in your customer experience. This is makes it much harder to track down bugs and harder to handle in your customer experience. This is why 90th percentile and 95th percentiles are important to look at.
Reliability is a key metric to monitor especially since your integrating APIs that you don’t have control over. What percent of API calls are failing? In order to track reliability, you should have a rigid definition on what constitutes a failure.
While any API call that has a response status code in the 4xx or 5xx family may be considered an error, you might have specific business cases where the API appears to successfully complete yet the API call should still be considered a failure. For example, a data API integration that returns no matches or no content consistently could be considered failing even though the status code is always 200 OK. Another API could be returning bogus or incomplete data. Data validation is critical for measuring where the data returned is correct and up to date.
Not every API provider and integration partner follows suggested status code mapping
While reliability is specific to errors and functional correctness, availability and uptime is a pure infrastructure metric that measures how often a service has an outage, even if temporary. Availability is usually measured as a percentage of uptime per year or number of 9’s.
AVAILABILITY %DOWNTIME PER YEARDOWNTIME PER MONTHDOWNTIME PER WEEKDOWNTIME PER DAY90% (“one nine”)36.53 days73.05 hours16.80 hours2.40 hours99% (“two nines”)3.65 days7.31 hours1.68 hours14.40 minutes99.9% (“three nines”)8.77 hours43.83 minutes10.08 minutes1.44 minutes99.99% (“four nines”)52.60 minutes4.38 minutes1.01 minutes8.64 seconds99.999% (“five nines”)5.26 minutes26.30 seconds6.05 seconds864.00 milliseconds99.9999% (“six nines”)31.56 seconds2.63 seconds604.80 milliseconds86.40 milliseconds99.99999% (“seven nines”)3.16 seconds262.98 milliseconds60.48 milliseconds8.64 milliseconds99.999999% (“eight nines”)315.58 milliseconds26.30 milliseconds6.05 milliseconds864.00 microseconds99.9999999% (“nine nines”)31.56 milliseconds2.63 milliseconds604.80 microseconds86.40 microseconds
Many API providers are priced on API usage. Even if the API is free, they most likely have some sort of rate limiting implemented on the API to ensure bad actors are not starving out good clients. This means tracking your API usage with each integration partner is critical to understand when your current usage is close to the plan limits or their rate limits.
It’s recommended to tie usage back to your end-users even if the API integration is quite downstream from your customer experience. This enables measuring the direct ROI of specific integrations and finding trends. For example, let’s say your product is a CRM, and you are paying Clearbit $199 dollars a month to enrich up to 2,500 companies. That is a direct cost you have and is tied to your customer’s usage. If you have a free tier and they are using the most of your Clearbit quota, you may want to reconsider your pricing strategy. Potentially, Clearbit enrichment should be on the paid tiers only to reduce your own cost.
Monitoring API integrations seems like the correct remedy to stay on top of these issues. However, traditional Application Performance Monitoring (APM) tools like New Relic and AppDynamics focus more on monitoring the health of your own websites and infrastructure. This includes infrastructure metrics like memory usage and requests per minute along with application level health such as appdex scores and latency. Of course, if you’re consuming an API that’s running in someone else’s infrastructure, you can’t just ask your third-party providers to install an APM agent that you have access to. This means you need a way to monitor the third-party APIs indirectly or via some other instrumentation methodology.
#monitoring #api integration #api monitoring #monitoring and alerting #monitoring strategies #monitoring tools #api integrations #monitoring microservices
Kubernetes is a highly popular container orchestration platform. Multi cloud is a strategy that leverages cloud resources from multiple vendors. Multi cloud strategies have become popular because they help prevent vendor lock-in and enable you to leverage a wide variety of cloud resources. However, multi cloud ecosystems are notoriously difficult to configure and maintain.
This article explains how you can leverage Kubernetes to reduce multi cloud complexities and improve stability, scalability, and velocity.
Maintaining standardized application deployments becomes more challenging as your number of applications and the technologies they are based on increase. As environments, operating systems, and dependencies differ, management and operations require more effort and extensive documentation.
In the past, teams tried to get around these difficulties by creating isolated projects in the data center. Each project, including its configurations and requirements were managed independently. This required accurately predicting performance and the number of users before deployment and taking down applications to update operating systems or applications. There were many chances for error.
Kubernetes can provide an alternative to the old method, enabling teams to deploy applications independent of the environment in containers. This eliminates the need to create resource partitions and enables teams to operate infrastructure as a unified whole.
In particular, Kubernetes makes it easier to deploy a multi cloud strategy since it enables you to abstract away service differences. With Kubernetes deployments you can work from a consistent platform and optimize services and applications according to your business needs.
The Compelling Attributes of Multi Cloud Kubernetes
Multi cloud Kubernetes can provide multiple benefits beyond a single cloud deployment. Below are some of the most notable advantages.
In addition to the built-in scalability, fault tolerance, and auto-healing features of Kubernetes, multi cloud deployments can provide service redundancy. For example, you can mirror applications or split microservices across vendors. This reduces the risk of a vendor-related outage and enables you to create failovers.
#kubernetes #multicloud-strategy #kubernetes-cluster #kubernetes-top-story #kubernetes-cluster-install #kubernetes-explained #kubernetes-infrastructure #cloud