Presentation Recap: The Distributed Database Behind Twitter

Presentation Recap: The Distributed Database Behind Twitter

As Twitter's popularity skyrocketed, the data layer needed to scale. See the journey from MySQL to Cassandra to Manhattan, & get a glimpse into the future. Manhattan is our distributed database used to serve live real-time traffic. ... the data represent logically and what is its physical representation, ...

Twitter is one of the world’s favorite places for people and brands to connect online. Powering a global service that helps people share everything from breaking news and entertainment to sports, politics, and everyday interests takes infrastructure that can adapt and evolve over time.

In his talk at the 2020 Distributed SQL Summit, Mehrdad Nurolahzade, engineer on the real-time infrastructure team at Twitter, walks us through the distributed database behind Twitter, including their journey from MySQL to Cassandra, and then from Cassandra to building their own distributed database called Manhattan, ending with a glimpse into the future. Here’s the playback and tl;dr of the presentation.

The Distributed Database Behind Twitter

Mehrdad’s talk starts out describing the early days of Twitter’s architecture, starting in the mid 2000’s: “The architecture at this point is quite simple; it’s basically a monolith, which is internally referred to as the Monorail. It’s Ruby on Rails. … And this is backed by a single MySQL server with a single leader and a single follower.”

Over the years, the rapid rise in popularity of the Twitter website causes engineers to strive for greater scalability. They quickly identify the monolith as a source of pain for scalability. And, at the data layer, engineers attempt to improve the performance of MySQL.

But as the popularity continues to skyrocket, Twitter’s engineering team embarks on an ongoing IT evolution, evaluating and executing enhancements of many kinds to provide greater reliability and speed, including:

Design principles of Manhattan distributed database at Twitter

Here’s a glimpse into Manhattan’s scale today:

  • 20 production clusters
  • more than 1000 databases
  • powered by tens of thousands of nodes
  • serving many petabytes of data, at a rate of tens of millions of requests per second

However, operations at the size of Manhattan are difficult. To help streamline operations, Twitter built Genie to convert operational knowledge to automation.

Evolution continues today, including a continued focus on compliance, finalizing the migration to RocksDB, moving to Kubernetes, adopting Kafka, and integrating public cloud storage into the import and export pipelines.

Although Manhattan was custom built to meet the need of Twitter’s use cases, and Twitter’s engineering team will continue to build Manhattan, the diversity of use cases in a company of its size can’t be served by a single database.

Looking at the distributed database landscape, we see that it has significantly evolved since 2012, which [is when] we decided to build Manhattan,” said Mehrdad. Mehrdad added, “at Twitter, we hypothesize that some of these solutions can potentially complement or maybe even replace our internal offerings in the future.”

databases distributed sql how it works distributed sql summit manhattan twitter

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Introduction to Structured Query Language SQL pdf

SQL stands for Structured Query Language. SQL is a scripting language expected to store, control, and inquiry information put away in social databases. The main manifestation of SQL showed up in 1974, when a gathering in IBM built up the principal model of a social database. The primary business social database was discharged by Relational Software later turning out to be Oracle.

Distributed SQL: An Evolution of the Database

The next step in the evolution of database architecture is distributed SQL. Take a look at some of the characteristics here.As organizations transition to the cloud, they eventually find that the legacy relational databases that are behind some of their most critical applications simply do not take advantage of the promise of the cloud and are difficult to scale.

Distributed SQL Summit Schedule Now Live!

In two weeks, thought leaders, database builders, and application developers are coming together for a free online conference to push the boundaries of cloud native RDBMS forward. Distributed SQL (Virtual) Summit, now in its second year, is taking place September 15-17.

AlaSQL in Action: The JavaScript SQL Database

Overview on AlaSQL, the popular lightweight client-side in memory SQL database, including a real life example of AlaSQL in action. I was surprised to see that there aren’t more posts about this popular lightweight client-side in-memory SQL database online apart from this awesome article I found.

Backup Database using T-SQL Statements

Introduction In this article, We will discuss how to backup our database in MS-SQL Server using T-SQL Statements. We need to use BACKUP DATABASE statement to create full database backup, along with…