So, you are a data scientist, you work with data and need to explore it and run some analytics on the data before jumping into running extensive machine learning algorithms.

Let’s start by examining what Serverless is.


According to  Wikipedia, serverless computing is a cloud computing execution model where the cloud provider manages the server and dynamically allocates the resource needed to finish the task.

That means, as users, we are in charge of the logic only. We don’t need to take care of the servers, capacity planning, or maintenance operations scale. It doesn’t mean they are not happening; they are, just not by us.

What is SQL?

According to  Wikipedia, SQL is a domain-specific programming language used for managing data held in relational databases.

So what is Serverless SQL?

Serverless SQL is a distributed computing tool that enables us to process distributed data using SQL language without managing the databases servers themselves. If we have data, or “big data” in one or more of our data lake/storages, like  AWS S3 or  Azure blob storage, we can run SQL query on that data without the need to build a pipeline or inject the data into distributed databases such as Cassandra or MongoDB.

This is a huge advantage, especially when we would like to interact with offline data without creating the pipelines or peek at completely fresh new data that was just sunk into the storage before replicating/transforming and saved into a dedicated Database.

