Relational Database Management System (RDMS) is a program that allows us to create, update, and manage a relational database. Structured Query Language (SQL) is a programming language used to communicate with data stored in the RDMS. The SQL skill for using a RDMA is required for many data-related positions these days. In the social media forums like _Quora _or Reddit, there are many people who search for a public database for practicing their SQL querying skills. However, although there are many public data sets in a single spreadsheet, there are not many public databases online. Even if you found a data set for the topic you have interest in, the format of the data is usually just one spreadsheet, not a database for most cases. Therefore, it will be very useful to know how to convert a data set in one spreadsheet to a database with multiple tables fitting a relational database format. Knowing the process of this conversion can give us many chances to practice SQL querying skills with a variety of databases.

This is the first article in a three part series. The goal of this series is to show how to create a relational database for SQL. The whole process is to convert a data in one spreadsheet in Excel to a relational database for SQL. In this first article I create an Entity Relational Diagram (ERD) which is a graphical representation showing the relationships between entities.

The Data Set

Let’s find a data set for this practice. There are many public data on the Kaggle dataset. Among a bunch of data sets, I selected a data set named Sample Sales Data_. _The following pictures show the original format of the data which is contained in one spreadsheet.

Image for post

Image for post

Sample Sales Data from the Kaggle Dataset

The data has 25 columns and 2824 rows including headers. The list of the headers is as follows:

  • ORDERNUMBER: the identification number for each order
  • QUANTITYORDERED: the quantity ordered
  • PRICEEACH: the actual price paid for the transaction (variable across transactions)
  • ORDERLINENUMBER: the number of the order line
  • SALES: the amount of sales
  • ORDERDATE: the order date
  • STATUS: the shipping status (Shipped, Resolved, Cancelled, On Hold, Disputed, and In Progress)
  • QTR_ID: the quarter of the order date
  • MONTH_ID: the month of the order date
  • YEAR_ID: the year of the order date
  • PRODUCTLINE: the category of products
  • MSRP: the manufacture’s suggested retail price (constant across transactions)
  • PRODUCTCODE: the identification code for each product
  • CUSTOMERNAME: the customer names
  • PHONE: the phone numbers of customers
  • ADDRESSLINE1: addressline 1 for customers
  • ADDRESSLINE2: address line 2 for customers
  • CITY: city names for customers
  • STATE: state names for customers (only for customers located in the US)
  • POSTALCODE: postal codes for customers
  • COUNTRY: countries for customers
  • TERRITORY: the regional names of each country (NA, EMEA, Japan, and APAC)
  • _CONTACTLASTNAME _and CONTACTFIRSTNAME: the last and first names of employees who are contacted for the transaction
  • DEALSIZE: the deal sizes of orders

#entity-relationship-model #sales-data #sql #database-design #relational-databases

Creating a Database: Converting a Spreadsheet to a Relational Database (Part 1)
1.45 GEEK