In recent polls we’ve conducted with engineers and leaders, we’ve found that around 70% of participants used MTTA and MTTR as one of their main metrics. 20% of participants cited looking at planned versus unplanned work, and 10% said they currently look at no metrics. While MTTA and MTTR are good starting points, they’re no longer enough. With the rise in complexity, it can be difficult to gain insights into your services’ operational health.

In this blog post, we’ll walk you through holistic measures and best practices that you can employ starting today. These will include challenges and pain points in gaining insight as well as key metrics and how they evolve as organizations mature.

Pain Points for Creating Useful Metrics

It’s easy to fall into the trap of being data rich but information poor. Building metrics and dashboards with the right context is crucial to understanding operational health, but where do you start? It’s important to look at roadblocks to adoption thus far in your organization. Perhaps other teams (or even your team) have looked into the way you measure success before. What halted their progress? If metrics haven’t undergone any change recently, why is that?

Below are some of the top customer pain points and challenges that we typically see software and infrastructure teams encounter.

  • Lack of data: Your data is fragmented across your APM, ticketing, chatops, and other tools. Even worse, it’s typically also siloed across teams that run at different speeds. A lot of it is tribal knowledge, or it simply doesn’t exist.
  • No feedback loop: There’s limited to no integration between incidents, retrospectives, follow-up action items, planned work, and customer experience. It’s challenging to understand how it all ties together as well as pinpoint how to improve customer experience. You’re constantly being redirected by unplanned work and incidents.
  • Blank slate: Traditional APM and analytics tools are great for insights, but without a baseline of metrics that are prescriptive and based on operational best practices, it’s hard to know where to start.
  • One-size-fits-all: What works for one team won’t necessarily work for another. Everything needs to be put in the right context to provide truly relevant insights.

With these pain points in mind, let’s look at some key metrics other organizations we’ve spoken to have found success with.

#devops #metrics #site reliability engineering #site reliability #site reliability engineer #metrics monitoring #site reliability engineering tools

Here Are the Metrics you Need to Understand Operational Health
1.05 GEEK