I started using groupby with custom aggregations and I want to share what I learned with you.

Pandas groupby is a function you can utilize on dataframes to split the object, apply a function, and combine the results. This function is useful when you want to group large amounts of data and compute different operations for each group. If you are using an aggregation function with your groupby, this aggregation will return a single value for each group per function run. After forming your groups, you can run one or many aggregations on the grouped data.

The dataset I am using today is Amazon Top 50 Bestselling Books on Kaggle. This dataset has some nice numeric columns and categories that we can work with. Importing that dataset, we can quickly look at one example of the data using head(1) to grab the first row and .T to transpose the data. Here we can see that Genre is a great category column to groupby, and we can aggregate the user ratings, reviews, price, and year.

df = pd.read_csv("bestsellers_with_categories.csv")
print(df.head(1).T)

>>> 0
>>> Name         10-Day Green Smoothie Cleanse
>>> Author                            JJ Smith
>>> User Rating                            4.7
>>> Reviews                              17350
>>> Price                                    8
>>> Year                                  2016
>>> Genre                          Non Fiction

Now that we have taken a quick look at the columns, we can use groupby to group Genre’s data. Before applying groupby, we can see two Genre categories in this dataset, Non-Fiction, and Fiction, meaning we will have two groups of data to work with. We can play around with the groups if we wanted to consider the author or book title, but we will stick with Genre for now.

df.Genre.unique()

>>> array(['Non Fiction', 'Fiction'], dtype=object)
group_cols = ['Genre']
ex = df.groupby(group_cols)

#software-development #data #python #creating custom aggregations to use with pandas groupby #pandas groupby #custom aggregations

Creating Custom Aggregations to Use with Pandas groupby
1.20 GEEK