Pandas groupby
is a function you can utilize on dataframes to split the object, apply a function, and combine the results. This function is useful when you want to group large amounts of data and compute different operations for each group. If you are using an aggregation function with your groupby
, this aggregation will return a single value for each group per function run. After forming your groups, you can run one or many aggregations on the grouped data.
The dataset I am using today is Amazon Top 50 Bestselling Books on Kaggle. This dataset has some nice numeric columns and categories that we can work with. Importing that dataset, we can quickly look at one example of the data using head(1)
to grab the first row and .T
to transpose the data. Here we can see that Genre is a great category column to groupby,
and we can aggregate the user ratings, reviews, price, and year.
df = pd.read_csv("bestsellers_with_categories.csv")
print(df.head(1).T)
>>> 0
>>> Name 10-Day Green Smoothie Cleanse
>>> Author JJ Smith
>>> User Rating 4.7
>>> Reviews 17350
>>> Price 8
>>> Year 2016
>>> Genre Non Fiction
Now that we have taken a quick look at the columns, we can use groupby
to group Genre’s data. Before applying groupby
, we can see two Genre categories in this dataset, Non-Fiction, and Fiction, meaning we will have two groups of data to work with. We can play around with the groups if we wanted to consider the author or book title, but we will stick with Genre for now.
df.Genre.unique()
>>> array(['Non Fiction', 'Fiction'], dtype=object)
group_cols = ['Genre']
ex = df.groupby(group_cols)
#software-development #data #python #creating custom aggregations to use with pandas groupby #pandas groupby #custom aggregations