Imagine — You’re in a system design interview and need to pick a database to store, let’s say, order-related data in an e-commerce system. Your data is structured and needs to be consistent, but your query pattern doesn’t match with a standard relational DB’s. You need your transactions to be isolated, and atomic and all things ACID… But OMG it needs to scale infinitely like Cassandra!! So how would you decide what storage solution to choose? Well, let’s see!

First of all, what kind of data are we working with? Is it records or file systems or audio/video content? And what kind of processing do we intend to do on that data? Do we need to search for something or run sophisticated analytics algorithms?

Based on our requirements and how we want to use or access our data, we could be looking for the following storage solutions:

**Caching Solutions **— If we are designing a read-heavy system like Twitter or Facebook, we might end up catching a lot of data, even complete timelines, to meet the low latency requirement. Some options here would be Redis or Memcached.

**File system storage **— If we are designing some sort of asset delivery service, where we might need to store images or audio/video files, we might need to use something called blob storage. An extremely popular example is Amazon S3.

Text search engine — What if we are designing a system like Amazon and need to implement a search feature. The thing about search features is that we need to consider typos as well. Suppose a user wants to search for “shirt” but types “shrt” instead. Now if we don’t show any results, it would be very poor user experience. Our system needs to be smart enough to show results for “shirt” or “shorts”. This is known as fuzzy search and this is where we use text search engines like Elasticsearch.

**Data Warehouses **— I know! We have been discussing data and storage all this time, so how can we not consider Big Data! Sometimes we just need to dump all the data in a single store where we can later perform all sorts of analytics. These systems are used more for offline reporting that usually transactions. This is where we end up using data warehouse solutions like Hadoop.

Now, you might have noticed that we have been talking about “storage solutions” not “databases”. So let’s have a look at **Databases **now!

#database #crackingtheinterviews #system-design-interview #cassandra

Choosing the right Database
1.10 GEEK