I’m preparing a series of post and samples on how to properly load data into Azure SQL using Azure Databricks / Apache Spark that I will start to publish very soon, but I realized today that there is a pre-requisite that in many cases, especially by developers new to the data space, is overlooked: good table design.

Wait! If you’re not a Apache Spark user you might think this post is not for you. Please read on, it will be just a couple of minutes, and you will find something help also for you, I promise.

By good table design, I don’t mean, in this case, normalization, research of the best data type or any other well-known technique…no, nothing like that. They are still absolutely useful and encouraged, but let’s leave them aside for now, and let’s focus on something much simpler.

Simpler but that, in the case I used to build the aforementioned samples, had an impact of 300%. Right, 300%. By changing a very simple thing I could improve (or worsen, depending on where you are starting from) performance by 3 times.

#apache spark #azure databricks #azure sql #big data #databricks #modeling #performances #spark

Fast loading data into Azure SQL: a lesson learned.
1.10 GEEK