Building Reactive Scalable Systems

scalability word map

Building a Reactive System is all about the balance between consistency and availability and the consequences of picking one over the other. This article mainly focuses on consistency and availability and how they impact the scalability of a system.

What are Scalability, Consistency, and Availability?

A system is considered scalable if it can meet the increase in demand while remaining responsive.

A system is considered consistent if all the nodes show the same data at the same time.

A system is considered available if it remains responsive despite any failures.

How Does the Scalability of a System Differ From the Performance of the System?

Scalability and performance are related but different concepts and we need to understand what the difference is.

Scalability is the number of requests a system can handle at a time, i.e. load. It’s about optimizing the ability to handle the load, which means improving how many requests a system can handle at a time. Performance on the other hand is the time system takes to complete a single request, i.e. latency. It’s about optimizing the response time, which means improving how quickly a system can handle a single request.

Performance has a limit on reducing the response time, and we will eventually reach that limit. Whereas, scalability has no theoretical limit. We may be restricted by the implementation, but in a perfectly scalable system, we could scale forever.

So when we build Reactive Micro-services we tend to focus on improving scalability than improving performance.

How Can We Measure Scalability and Performance of a System?

Measurement like requests-per-second measures both. This makes it a valuable metric because we can use it to see whether we have improved our scalability or our performance. But it also means that it is somewhat restrictive in the sense that if it improves we can’t tell which one changed. So if we want to know where that improvement came from then we have to track scalability and performance individually.

How Can We Explain Consistency in Distributed Systems?

Distributed systems are systems that are separated by space. This means the system could be deployed across multiple data centers or within the same data center, or just deployed to different hardware or the same hardware.

Even if it’s deployed to the same hardware, a distributed system is one where information has to be transferred between different parts of that system, and when that information is transferred it’s crossing some sort of space. It could be going over a local network, or it could be writing to a disk, or it could be writing to a database.

Information cannot be transferred instantaneously, it takes some time. Granted that time could be very small but there is an amount of time that elapses during the transfer of information. Within that time duration when the transfer of the information takes place, the state of the original sender may change.

The key here is to recognize that when we are dealing with a distributed system, we are always dealing with stale data. The reality_ is eventually consistent._

What Is Eventual Consistency?

When a system stops receiving updates at least for some time, we can guarantee that all parts of the system will eventually converge on the same state. Thus in this way, we can reach that level of consistency.

Common source control tools (Git, Subversion, etc) operate on an eventually consistent model. They rely on a later merge operation to bring things back into alignment. That’s how modern source control tools achieve consistency and it’s all an eventually consistent system.

Traditional monolithic architectures are usually based around strong consistency they use a strongly consistent database like a SQL database.

What Is Strong Consistency?

When all members of a system agree on the state, before it becomes available, then we reach the level of strong consistency.

We can achieve strong consistency by introducing mechanisms like locks. Distributed system problem occurs when we have multiple things which are responsible for the same piece of data. As long as only one thing is responsible for that data, as long as we only have one instance of the lock, it’s not a distributed system problem anymore. Thus in this way, we can resolve the distributed system problem by using a non distributed resource(lock).

But when we introduce a lock, it introduces overhead in the form of contention. That overhead has consequences to our ability to be elastic, to be resilient, and it has other consequences as well.

#scalability #reactive architecture #cap theorem #reactive systems #reactive microservices #reactive