TLDR
Since my last post (SQL On-Demand: An easier way to Query Data | by Dayo Bamikole | Towards Data Science), Azure Synapse Analytics became GA (Generally Available) to the public, I mentioned in the post some of the Top Features in Synapse, one of them being SQL Pool. There are a few ways to Query Data in Azure Synapse, you have SQL Pool and then Apache Spark Pools. There are 2 types of SQL Pool: Dedicated and Serverless. In a subsequent post I focused on Serverless SQL Pool formerly known as SQL On-Demand. In this post I will focus on Dedicated SQL Pools.
SQL Pool is the traditional Data Warehouse. It was formerly known as Azure SQL Data Warehouse before it came under the Synapse Family. It is a Big Data Solution that stores data in a relational table format with columnar storage. It also uses a Massive Parallel Processing (MPP) architecture to leverage up to 60 nodes to run queries. Once you have your data in a Dedicated SQL Pool you can leverage this for historical analysis from a dashboard, use it as a dataset for Machine Learning, and any other data goals you might for a massive dataset.
For more on Synapse SQL Architecture check out Microsoft Docs
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/overview-architecture
The goal of using Dedicated SQL Pools is to store data on a massive scale with ability to query efficiently. This is easier since it’s stored in a columnar format and you can leverage clustered columnstore indexing for fast retrieval.
Well, you might want to know how to get data into this platform right?
A term you will hear a lot around Dedicated SQL Pool is Polybase. What is Polybase, it’s a tool that allows you to query data from external sources e.g. SQL Server, Oracle, Teradata, MongoDB etc. You can see how this can be very beneficial in using Dedicated SQL Pool. Once the data is ingested, Polybase also gives the ability to query the big data.
#azure-synapse-analytics #data-warehouse #sql #azure #big-data