Over the years, GitHub engineers have developed many ways to observe how our systems behave. We mostly make use of statsd for metrics, the syslog format for plain text logs and OpenTracing for request traces. While we have somewhat standardized what we emit, we tend to solve the same problems over and over in each new system we develop.

And, while each component serves its individual purpose well, interoperability is a challenge. For example, several pieces of GitHub’s infrastructure use different statsd dialects, which means we have to special-case our telemetry code in different places – a non-trivial amount of work!

Different components can use different vocabularies for similar observability concepts, making investigatory work difficult. For example, there’s quite a bit of complexity around following a GitHub.com request across various telemetry signals, which can include metrics, traces, logs, errors and exceptions. While we emit a lot of telemetry data, it can still be difficult to use in practice.

We needed a solution that would allow us to standardize telemetry usage at GitHub, while making it easy for developers around the organization to instrument their code. The  OpenTelemetry project provided us with exactly that!

Introducing OpenTelemetry

OpenTelemetry introduces a common, vendor-neutral format for telemetry signals: OTLP. It also enables telemetry signals to be easily correlated with each other.

Moreover, we can reduce manual work for our engineers by adopting the OpenTelemetry SDKs – their inherent extensibility allows engineers to avoid re-instrumenting their applications if we need to change our telemetry collection backend, and with only one client SDK per-language we can easily propagate best practices among codebases.

OpenTelemetry empowers us to build integrated and opinionated solutions for our engineers. Designing for observability can be at the forefront of our application engineer’s minds, because we can make it so rewarding.

What we’re working on, and why

At the moment, we’re focusing on building out excellent support for the OpenTelemetry tracing signal. We believe that tracing your application should be the main entry point to observability, because we live in a distributed-systems world and tracing illuminates this in a way that’s almost magical.

We believe that tracing allows us to naturally and easily add/derive additional signals once it’s in place: many metrics, for example, can be calculated by backends automatically, tracing events can be converted to detailed logs automatically, and exceptions can automatically be reported to our tracking systems.

#engineering #github

Why (and How) GitHub Is Adopting OpenTelemetry
1.35 GEEK