The Case for Boring Tech: Relational Databases in the Cloud

Solutions for challenging technical problems shouldn’t result in a whole set of new ones. Sometimes, we make things harder on ourselves by choosing the new hotness to tackle technical problems (such as scaling infrastructure). We may be solving our problem in an interesting and fun way, but we bring on more complexity (and more problems) as a result of that technology choice.

I put out a few tweets a while back showing my true curmudgeon colors (AKA, a grumpy operator reminiscing about the good old days). And while initially the tweets were voiced in frustration over a technical issue I was grappling with, the sentiment remains true: we need a resurgence of boring tech (and I’m not the first to say it).

At the risk of truly proving myself to be that grumpy operator, there’s a case to be made for going back to the old — but tested, tried, and true — software of yesteryear. In this post, I make my case, drawing from one of my favorite examples: distributed versus relational databases.

Distributed Databases: the Good (and Not so Good)

Etcd 2.0 (its first stable release) came on in the scene in January 2015 and started growing in popularity, with Cloud Foundry and Kubernetes driving significant adoption. Distributed databases like etcd are great for high availability, which they offer through replication. With multiple copies of the same piece of data on multiple machines, you can rest assured that data will always be available at a moment’s notice.

But, the problem with distributed systems is there are many more moving parts (not to mention problems due to their susceptibility to network partitions and slow members). As such, the complexity goes up. Being an operator trying to reason about the system becomes all the more difficult. There are instances in which a distributed database won’t perform nearly as well as its predecessor (more on that later!), as in many of them you need some form of consensus. Every node in your cluster needs to agree on what the value might be for a particular key. So now, not only do you have additional overhead but also the chance for conflict, for which you need conflict resolution.

The thing about distributed databases — or really any technology — is it has to fit your data needs. The data you put in it and how you’re going to access it has to fit a certain mold. Forcing anything upon this technology is foolish.

As with many technical problems, they grow exponentially at scale. For instance, if you’re abusing a distributed key-value store or database for something it wasn’t designed for, it’s going to be very problematic. At scale, you simply dump more gasoline on the fire — your problems get amplified and outages increase.

At certain levels, doubling the size of your infrastructure isn’t a big deal — i.e., going from 100 to 200 servers. If you’re at 10,000 servers and double, that’s another story. The stresses on the system are more significant and can hit you in an instant.

#cloud native #data #contributed #cloud

Distributed Databases: the Good (and Not so Good)

thenewstack.io

The Case for Boring Tech: Relational Databases in the Cloud