Platform-Agnostic Tracing

One of the three fundamental laws of classical mechanics is Newton’s first law, which says that an object will remain at rest or continue to move at a constant velocity unless acted upon by force.

We live in an imperfect world where nothing ever remains in motion, and nothing ever remains at rest. Nature is full of forces that toss things around, sometimes against our wishes. As engineers, we’ve yet to figure out the magic that makes apps run forever without issues.

Instead, we’ve learned to build apps that are observable, because we know that bad stuff will happen sooner or later. And when it does, we’ve got to know “what” and “why” in order to recover quickly. In a distributed system, shifting through logs can be daunting. Likewise, metrics have limitations. They could show that something is wrong, but good luck finding out “what” and “where.” This is where tracing comes in.

Tracing is a way of stitching requests together as they transit multiple services. It helps you observe distributed systems, to pinpoint the causes of suboptimal performance and failures.

Microservices applications consist of interconnected systems, services or functions, that work together to serve requests. For example, a microservice app may include an order service, cart service, payment service, and catalog service. Each service is isolated and separated by a network boundary, and the services could be hosted on different platforms. The distributed nature of microservices makes it necessary to have a way to track how requests go through your entire ecosystem of services.

A trace consolidates data to locate failures, correlate error reports, identify how an issue in a single service affects other services, and provide insights into the services that affect your application’s overall performance.

The Fabric of Trace

Fundamentally, a trace begins with a single request and represents the request’s entire journey as it transitions through all the services of a distributed system. Each trace is made up of a series of tagged time intervals, or spans.

You can view a span as a fundamental element of a distributed trace, representing a unit of work done by a single service in a distributed system. A span may have a unique ID, timestamp, name, and metadata. Spans may also contain logs — in the form of key-value pairs, which are useful for capturing span-specific informational or debugging output and logging messages.

#monitoring #serverless #contributed #sponsored

The Fabric of Trace

thenewstack.io

Platform-Agnostic Tracing