Why you are throwing money away

What you should consider before migrating to the cloud to make your data warehouse and data lake future-proof & how the separation of storage and compute was approached by Snowflake.

Not so long ago, establishing an enterprise data warehouse involved a project that would take months or even years. These days, with cloud computing, you can easily register for a SaaS or PaaS offering provided by one of the cloud vendors, and shortly after you can start building your schemas and tables. In this article, I will discuss the key features to consider when migrating a data warehouse to the cloud and why is it a smart choice to pick one that separates storage from compute.

What does it mean to separate storage and compute?

From a single server to a data warehouse cluster

It all boils down to the difference between scale-out & scale-in vs. scale-up & scale-down. In older database and data warehouse solutions the storage and compute reside within a single (often large & powerful) server instance. This may work well until this single server instance would reach its maximum compute or storage capacity. In such cases, in order to accommodate the increased workloads, you could scale-up, i.e. exchange the CPU, RAM, or storage disks to ones with a larger capacity — with cloud services it would mean switching to a larger instance. Analogically, if your single instance is too large, to save money, you could exchange it for a smaller one, i.e. scale-down. This process has two main disadvantages:

  • scale-up & scale-down process is time-consuming and often means that your data warehouse would become unavailable for some time
  • there is a limit to how much you can scale-up due to the natural limitations of a single server instance.

MPP: Massively Parallel Computing

In order to mitigate this problem, data warehouse vendors started using MPP (Massively Parallel Computing) paradigm, allowing your data warehouse to use an entire cluster of instances at once. This way, if you start reaching the maximum capacity limits, you can simply add another server instance with more storage and compute capacity to the cluster (i.e. scale-out).

