Broadly speaking, many of the new debugging challenges you are expected to face with distributed microservices can be categorized as networking problems between the different parts of the infrastructure.

Note that inter-service communication in distributed systems is implemented either as a request/response synchronous communication (REST, gRPC, GraphQL) or asynchronous event-driven messaging (Kafka, AMQP, and many others).

Synchronous mechanisms are the clear winners – at least as of late 2020 – because it is much easier to develop, test, and maintain synchronous code. But they bring with them a host of problems. Let’s take a look at some of the possible friction points first, and then explore a few of the possible tools we can use to tackle them.

Inconsistent Network Layers

Your microservices might be deployed in various, different public clouds or on-prem, which means the networking layer service is based on top of can varies drastically between services. This is often the cause of sudden, non-reproducible timeouts and bursts of increased latency and low throughput. These are often a sad daily routine, the majority of which is out of your control.

Service Discovery

Microservices are dynamic, so the routing should be as well. It’s not clear to a service where exactly in the topology its companion service is located, so specialized tooling is needed to allow each service to dynamically detect its peers.

Cascading Failures and Propagated Bottlenecks

Any microservice may start responding slower to the network requests from other services because of high CPU, low memory, long-running DB queries, and other factors. This may end up causing a chain reaction that will slow down other services, causing even more bottlenecks or making them drop connections.

#security #architecture #microservices #network #observability

Debugging Microservices Networking Issues — An Introduction
1.30 GEEK