Groupby is a very popular function in Pandas. This is very good at summarising, transforming, filtering, and a few other very essential data analysis tasks. In this article, I will explain the application of groupby function in detail with example.
Here I am importing the necessary packages and the dataset:
import pandas as pd
import numpy as np
df = pd.read_csv('StudentsPerformance.csv')
df.head()
Groupby function splits the dataset based on criteria that you define. Here I am showing the process behind the groupby function. It will give you an idea, how much work we may have to do if we would not have groupby function. I will make a new smaller dataset of two columns only to demonstrate in this section. The columns are ‘gender’ and ‘reading score’.
test = df[['gender', 'reading score']]
test.head()
Let’s find out the average reading score gender-wise
First, we need to split the dataset based on gender. Generate the data for females only.
female = test['gender'] == 'female'
test[female].head()
In the same way, generate the data for the males,
male = test['gender'] == 'male'
test[male].head()
Using females and males dataset above to calculate the mean reading score for females and males respectively.
fe_avg = test[female]['reading score'].mean()
male_avg = test[male]['reading score'].mean()
print(fe_avg, male_avg)
#machine-learning #artificial-intelligence #python #data-science #programming