Introduction

Anything and everything related to data in the 21st century has become of prime relevance. And one of the key skills for any data science aspirant is mastering SQL functions for effective and efficient data retrieval. SQL is widely used for querying directly from databases and is, therefore, one of the most commonly used languages for data analysis tasks. But it comes with its own intricacies and nuances.

Image for post

When it comes to SQL functions, there are a plethora of them. You need to know the right function at the right time to achieve what you are looking for. But the majority of us including me have a tendency to skip this topic or keep it hanging till a distant future. And trust me it is a blunderous mistake to leave these topics unturned in your learning journey.

Therefore, in this article, I will take you through some of the most common SQL functions that you are bound to use regularly for your data analysis tasks.

Table of Contents

i) Introducing the Dataset

ii) Aggregate functions in SQL

  • Count
  • Sum
  • Average
  • Min and Max

iii) Mathematical functions in SQL

  • Absolute
  • Ceil and Floor
  • Truncate
  • Modulo

iv) String functions in SQL

  • Lower and Upper
  • Concat
  • Trim

v) Date and Time functions in SQL

  • Date and Time
  • Extract
  • Date format

vi) Windows functions in SQL

  • Rank
  • Percent value
  • Nth value

vii) Miscellaneous functions

  • Convert
  • Isnull
  • If

Introducing the Dataset

I will show you the practical application of all the functions covered in this article by working with a dummy dataset. Let’s assume there is a retail chain all over the country. The following SQL table records who bought items from the retail shop, on what date they bought the item, the city they are from, and the purchase amount.

Image for post

We are going to use this example we learn the different functions in this article.

Aggregate functions

  • Count

One of the most important aggregate functions is the count() function. It returns the number of records from a column in the table. In our table, we can use the count() function to get the number of cities where the order came from. We do that as follows:

Image for post

You would have noticed two things here. Firstly, the Null function counts the null values. Then, duplicate values are counted multiple times. To deal with this problem, we can pair it with the distinct() function which will count only the distinct values in the column.

Image for post

  • Sum

Whenever we are dealing with columns related to numbers, we are bound to check out their total sum. For example in our table, the total sum of Amount is important to analyze the sales that occurred.

The sum can be calculated using the sum() function which works on the column name.

Image for post

But what if we want to calculate the total amount for every city?

For that to happen, we can combine this function with the Groupby clause to group the output by the city. Here is how you can make it happen.

Image for post

This shows us that the company had Indore as the highest income generating city for us.

  • Average

Anyone who has done some data analysis in the past knows that average is a better metric than just computing the sum of the numerical values.

In our example, we have multiple orders from the same city, therefore, it would be more prudent to calculate the average amount rather than the total sum.

Image for post

  • Min and Max

Finally, aggregate value analysis isn’t complete without computing the min and max values. These can be simply computed using the min() and max() functions.

Image for post

Image for post

Mathematical functions

Most of the time you would have to deal with numbers in the SQL table for data analysis. To deal with these numbers, you need mathematical functions. These might have a trivial definition but when it comes to the analysis, they are the most prolifically used functions.

  • Absolute

**abs() **is the most common mathematical function. It calculates the absolute value of a numeric value that you pass as an argument.

To understand where it is helpful, let’s first find out the deviation of the amount for every record from the average amount from our table.

Image for post

Now, as you can see we have some negative values here. These can be easily converted to positives using the abs() function as shown below:

Image for post

  • Ceil and Floor

When dealing with numeric values, some of them might have decimal values. How do you deal with those? You can simply convert them to either the next higher integer using ceil() or the previous lower integer using floor().

In our table, the Amount column has lots of decimal values. We can convert them to integers using ceil() or the **floor() **function.

#data-science #data-analysis #machine-learning #sql #function

24 Commonly used SQL Functions for Data Analysis tasks
1.30 GEEK