In this blog, we’ll talk about what to look for in an SRE tool, and how they’ll help you on your journey to reliability excellence.

Implementing SRE practices and culture can be challenging. Fortunately, there are a variety of tools for each aspect of SRE: monitoring, SLOs and error budgeting, incident management, incident retrospectives, alerting, chaos engineering, and more. In this blog, we’ll talk about what to look for in an SRE tool, and how they’ll help you on your journey to reliability excellence.

Monitoring Tools

At the heart of all SRE decision-making is data. Without logging latency, availability, and other reliability metrics throughout your system, you’ll have no way of knowing where to invest your development efforts. Several monitoring tools such as AppDynamics, Datadog, Grafana, and Prometheus are available to help collect this data and display it in efficient ways.

Monitoring can be broken down into four main categories:

  • Resource monitoring: reports on how servers are running with metrics such as RAM usage, CPU load, and remaining disk space.
  • Network monitoring: reports on incoming and outgoing traffic which can be broken down into the frequency and size of specific requests.
  • Application performance monitoring: reports on the performance of services by sending internal requests to them and monitoring metrics such as response time, completeness of response, and data freshness.
  • Third-party component monitoring: reports on the health and availability of third-party services integrated into your system.

To get a full picture of your service, you’ll want to incorporate elements of all four of these categories. Most monitoring tools will provide options for multiple categories. Look for ones that integrate well with your existing tool stack, as you’ll need the monitoring tool to be able to gather and interpret data directly from your existing sources.

Try to find tools that can generate visualizations and reports that your team will find useful. For example, if you’re trying to see which services generate the most network traffic, look for a tool that can create pie charts of overall network usage.

#tools #devops #sre #tools and methods #tools 2020

Choosing the Right SRE Tools
1.60 GEEK