SQL is a programming language used to manage data in relational databases which store data in tabular form with labelled rows and columns. SQL is a fundamental tool to put in your skill set if you plan to work in the field of data science.
We can query a database to retrieve data without any modification or filtering. However, this is not a proactive or efficient way. We are most likely to apply some data manipulation or filtering on the raw data to make it more useful.
SQL is highly capable of doing data manipulation and filtering operations. The advantage of using SQL for such tasks is that we just need to retrieve the data we need instead of getting all at once. Furthermore, it allows for doing data analysis while retrieving the data.
In this article, we will be analyzing a supermarket sales dataset available on Kaggle. I have done some modifications on the dataset and simplified it by excluding some of the columns. I have created a table in a MySQL database with the modified version of the dataset.
The first 5 rows of the supermarket table (image by author)
The dataset contains the details of sales at different branches of a supermarket chain during a period of 3 months.
Note: I wrote another practical article on this dataset using Pandas, a highly popular data analysis and manipulation library for Python.
#sql #data-science #machine-learning