Efficient Data Summarizing and Analysis Using Pandas’ Groupby Function

Groupby is a very popular function in Pandas. This is very good at summarising, transforming, filtering, and a few other very essential data analysis tasks. In this article, I will explain the application of groupby function in detail with example.

Dataset

Here I am importing the necessary packages and the dataset:

import pandas as pd
import numpy as np
df = pd.read_csv('StudentsPerformance.csv')
df.head()

How Groupby Works?

Groupby function splits the dataset based on criteria that you define. Here I am showing the process behind the groupby function. It will give you an idea, how much work we may have to do if we would not have groupby function. I will make a new smaller dataset of two columns only to demonstrate in this section. The columns are ‘gender’ and ‘reading score’.

test = df[['gender', 'reading score']]
test.head()

Image for post

Let’s find out the average reading score gender-wise

First, we need to split the dataset based on gender. Generate the data for females only.

female = test['gender'] == 'female'
test[female].head()

Image for post

In the same way, generate the data for the males,

male = test['gender'] == 'male'
test[male].head()

Image for post

Using females and males dataset above to calculate the mean reading score for females and males respectively.

fe_avg = test[female]['reading score'].mean()
male_avg = test[male]['reading score'].mean()
print(fe_avg, male_avg)

#machine-learning #artificial-intelligence #python #data-science #programming

Dataset

How Groupby Works?

towardsdatascience.com

Efficient Data Summarizing and Analysis Using Pandas’ Groupby Function