Distributed SQL: An Evolution of the Database

As organizations transition to the cloud, they eventually find that the legacy relational databases that are behind some of their most critical applications simply do not take advantage of the promise of the cloud and are difficult to scale. It is the database that is limiting the speed and effectiveness of this transition. To address this, organizations want the reliability of a tested relational data store, such as Oracle, SQL Server, Postgres, and MySQL, but with the benefits of scale and global coverage that comes with the cloud

Some have turned to NoSQL stores to try to meet these requirements. These alternatives can typically meet the scale requirements but then fall short as a transactional database because they were not designed from the ground up to provide true consistency. Recently, some of the NoSQL solutions have offered “ACID transactions” but they’re full of caveats and fail at delivering isolation levels necessary for mission-critical workloads like a financial ledger, inventory control, and identity management.

A New Breed of Database…Distributed SQL

Some of the most successful companies that function at a global scale have actually sorted out this problem and purpose-built databases to handle this. The most public example of this is Google Cloud Spanner. In 2012, Google published a paper on Spanner that demonstrated a new way of looking at databases, one that was rooted in distributed systems and a global scale.

“Spanner is Google’s scalable, multi-version, globally distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions.”

– Spanner: Google’s Globally-Distributed Database

There is a lot wrapped up in the description and also a 14-page paper that goes into explicit detail of how they were able to build a consistent AND scalable database. The paper is pure genius and outlines the foundation of the next evolution of the database: Distributed SQL.

What Comprises a Distributed SQL Database?

Several attempts have been made to deliver truly scalable SQL in a distributed environment. Some have tried to retrofit existing databases to meet their needs but this ultimately does not deliver on the promise of a truly Distributed SQL database. So then, what makes up a Distributed SQL database? The requirements can be summarized into the five core conditions:

1. Scale

A distributed SQL database must seamlessly scale in order to mirror the capabilities of cloud environments without introducing operational complexity. Just as we can scale up compute without heavy lifting, the database should be able to scale as well. This includes an ability to evenly distribute data across multiple distributed participants in the database.

2. Consistency

A distributed SQL database must deliver a high level of isolation in a distributed environment. In a cloud-based world with distributed systems and microservices are the default architectures, transactional consistency becomes difficult as multiple operators may be trying to work on the same data. The database should mediate contention and deliver the same level of isolation of transactions as we expect in a single instance database.

3. Resiliency

A distributed SQL database must naturally deliver the highest level of resiliency without any need of external tooling to accomplish this. The cloud presents an always-on environment for our workloads and the database should have the same properties. With a distributed database we can reduce the time it takes to recover from a failure down to near zero and replicate data naturally without any external configuration.

4. Geo-replication

A distributed SQL database should allow for the distribution of data throughout a complex, widely dispersed geographic environment. The cloud presents an ability to reach every corner of the globe with an acceptable quality of service and the database should not restrict your applications from doing so. It should perform to meet your expectations

5. SQL

And while these four technical requirements are paramount, there is one key prerequisite above all. The database must speak SQL. It is the language of data and the default for all application logic. We should not have to retrain developers to use the database. They should be able to use the SQL dialect they are already familiar with.

There are a few databases that meet these requirements. The list includes Spanner, of course, but you could also consider Amazon Aurora, Yugabyte, FaunaDB and CockroachDB as members of this new category. All of these members meet the requirements in some form, some better than others. Noticeably missing from this list are Oracle, Postgres, MySQL and all of the NoSQL options. While each may meet some of the requirements, none of them meet all of the requirements and cannot be considered alternatives.

#database #sql #cockroachdb #distributed sql databases #distributed sql