How to Use Pandas GroupBy, Counts, and Value Counts

If you’re a data scientist, you likely spend a lot of time cleaning and manipulating data for use in your applications. One of the core libraries for preparing data is the Pandas library for Python.

In a previous post, we explored the background of Pandas and the basic usage of a Pandas DataFrame, the core data structure in Pandas. Check out that post if you want to get up to speed with the basics of Pandas.

In this post, we’ll explore a few of the core methods on Pandas DataFrames. These methods help you segment and review your DataFrames during your analysis.

We’ll cover

  • Using Pandas groupby to segment your DataFrame into groups.
  • Exploring your Pandas DataFrame with counts and value_counts.

Let’s get started.

Pandas Groupby

Pandas is typically used for exploring and organizing large volumes of tabular data, like a super-powered Excel spreadsheet. Often, you’ll want to organize a pandas DataFrame into subgroups for further analysis.

For example, perhaps you have stock ticker data in a DataFrame, as we explored in the last post. Your Pandas DataFrame might look as follows:

>>> df
     date   symbol  open    high    low    close   volume
0 2019-03-01 AMZN 1655.13 1674.26 1651.00 1671.73 4974877
1 2019-03-04 AMZN 1685.00 1709.43 1674.36 1696.17 6167358
2 2019-03-05 AMZN 1702.95 1707.80 1689.01 1692.43 3681522
3 2019-03-06 AMZN 1695.97 1697.75 1668.28 1668.95 3996001
4 2019-03-07 AMZN 1667.37 1669.75 1620.51 1625.95 4957017
5 2019-03-01 AAPL 174.28 175.15 172.89 174.97 25886167
6 2019-03-04 AAPL 175.69 177.75 173.97 175.85 27436203
7 2019-03-05 AAPL 175.94 176.00 174.54 175.53 19737419
8 2019-03-06 AAPL 174.67 175.49 173.94 174.52 20810384
9 2019-03-07 AAPL 173.87 174.44 172.02 172.50 24796374
10 2019-03-01 GOOG 1124.90 1142.97 1124.75 1140.99 1450316
11 2019-03-04 GOOG 1146.99 1158.28 1130.69 1147.80 1446047
12 2019-03-05 GOOG 1150.06 1169.61 1146.19 1162.03 1443174
13 2019-03-06 GOOG 1162.49 1167.57 1155.49 1157.86 1099289
14 2019-03-07 GOOG 1155.72 1156.76 1134.91 1143.30 1166559

Perhaps we want to analyze this stock information on a symbol-by-symbol basis rather than combining Amazon (“AMZN”) data with Google (“GOOG”) data or that of Apple (“AAPL”).

This is where the Pandas groupby method is useful. You can use groupby to chunk up your data into subsets for further analysis.

Perhaps we want to analyze this stock information on a symbol-by-symbol basis rather than combining Amazon (“AMZN”) data with Google (“GOOG”) data or that of Apple (“AAPL”).

This is where the Pandas groupby method is useful. You can use groupby to chunk up your data into subsets for further analysis.

Basic Pandas Groupby Usage

Let’s do some basic usage of groupby to see how it’s helpful.

In your Python interpreter, enter the following commands:

>>> import pandas as pd
>>> import numpy as np
>>> url = 'https://gist.githubusercontent.com/alexdebrie/b3f40efc3dd7664df5a20f5eee85e854/raw/ee3e6feccba2464cbbc2e185fb17961c53d2a7f5/stocks.csv'
>>> df = pd.read_csv(url)
>>> df
          date symbol     open     high      low    close    volume
0   2019-03-01   AMZN  1655.13  1674.26  1651.00  1671.73   4974877
1   2019-03-04   AMZN  1685.00  1709.43  1674.36  1696.17   6167358
2   2019-03-05   AMZN  1702.95  1707.80  1689.01  1692.43   3681522
3   2019-03-06   AMZN  1695.97  1697.75  1668.28  1668.95   3996001
4   2019-03-07   AMZN  1667.37  1669.75  1620.51  1625.95   4957017
5   2019-03-01   AAPL   174.28   175.15   172.89   174.97  25886167
6   2019-03-04   AAPL   175.69   177.75   173.97   175.85  27436203
7   2019-03-05   AAPL   175.94   176.00   174.54   175.53  19737419
8   2019-03-06   AAPL   174.67   175.49   173.94   174.52  20810384
9   2019-03-07   AAPL   173.87   174.44   172.02   172.50  24796374
10  2019-03-01   GOOG  1124.90  1142.97  1124.75  1140.99   1450316
11  2019-03-04   GOOG  1146.99  1158.28  1130.69  1147.80   1446047
12  2019-03-05   GOOG  1150.06  1169.61  1146.19  1162.03   1443174
13  2019-03-06   GOOG  1162.49  1167.57  1155.49  1157.86   1099289
14  2019-03-07   GOOG  1155.72  1156.76  1134.91  1143.30   1166559

In the steps above, we’re importing the Pandas and NumPy libraries, then setting up a basic DataFrame by downloading CSV data from a URL. We print our DataFrame to the console to see what we have.

Now, let’s group our DataFrame using the stock symbol. The easiest and most common way to use groupby is by passing one or more column names. For our example, we’ll use “symbol” as the column name for grouping:

>>> symbols = df.groupby('symbol')
>>> print(symbols.groups)
{'AAPL': Int64Index([5, 6, 7, 8, 9], dtype='int64'),
 'AMZN': Int64Index([0, 1, 2, 3, 4], dtype='int64'),
 'GOOG': Int64Index([10, 11, 12, 13, 14], dtype='int64')}

Interpreting the output from the printed groups can be a little hard to understand. In the output above, it’s showing that we have three groups: AAPL, AMZN, and GOOG. For each group, it includes an index to the rows in the original DataFrame that belong to each group.

The input to groupby is quite flexible. You can choose to group by multiple columns. For example, if we had a year column available, we could group by both stock symbol and year to perform year-over-year analysis on our stock data.

Using a Custom Function in Pandas Groupby

In the previous example, we passed a column name to the groupby method. You can also pass your own function to the groupby method. This function will receive an index number for each row in the DataFrame and should return a value that will be used for grouping. This can provide significant flexibility for grouping rows using complex logic.

As an example, imagine we want to group our rows depending on whether the stock price increased on that particular day. We would use the following:

>>> def increased(idx):
...     return df.loc[idx].close > df.loc[idx].open
...
>>> df.groupby(increased).groups
{False: Int64Index([2, 3, 4, 7, 8, 9, 13, 14], dtype='int64'),
 True: Int64Index([0, 1, 5, 6, 10, 11, 12], dtype='int64')}

First, we would define a function called increased, which receives an index. It returns “True” if the close value for that row in the DataFrame is higher than the open value; otherwise, it returns “False.”

When we pass that function into the groupby() method, our DataFrame is grouped into two groups based on whether the stock’s closing price was higher than the opening price on the given day.

Operating on Pandas Groups

After you’ve created your groups using the groupby function, you can perform some handy data manipulation on the resulting groups.

In our example above, we created groups of our stock tickers by symbol. Let’s now find the mean trading volume for each symbol.

>>> symbols['volume'].agg(np.mean)
symbol
AAPL    23733309.4
AMZN     4755355.0
GOOG     1321077.0
Name: volume, dtype: float64

To complete this task, you specify the column on which you want to operate—“volume”—then use Pandas’ agg method to apply NumPy’s mean function. The result is the mean volume for each of the three symbols. From this, we can see that AAPL’s trading volume is an order of magnitude larger than AMZN and GOOG’s trading volume.

Iteration and Selecting Groups

Iteration is a core programming pattern, and few languages have nicer syntax for iteration than Python. Python’s built-in list comprehensions and generators make iteration a breeze.

Pandas groupby is no different, as it provides excellent support for iteration. You can loop over the groupby result object using a for loop:

>>> for symbol, group in symbols:
...     print(symbol)
...     print(group)
...
AAPL
         date symbol    open    high     low   close    volume
5  2019-03-01   AAPL  174.28  175.15  172.89  174.97  25886167
6  2019-03-04   AAPL  175.69  177.75  173.97  175.85  27436203
7  2019-03-05   AAPL  175.94  176.00  174.54  175.53  19737419
8  2019-03-06   AAPL  174.67  175.49  173.94  174.52  20810384
9  2019-03-07   AAPL  173.87  174.44  172.02  172.50  24796374
AMZN
         date symbol     open     high      low    close   volume
0  2019-03-01   AMZN  1655.13  1674.26  1651.00  1671.73  4974877
1  2019-03-04   AMZN  1685.00  1709.43  1674.36  1696.17  6167358
2  2019-03-05   AMZN  1702.95  1707.80  1689.01  1692.43  3681522
3  2019-03-06   AMZN  1695.97  1697.75  1668.28  1668.95  3996001
4  2019-03-07   AMZN  1667.37  1669.75  1620.51  1625.95  4957017
GOOG
          date symbol     open     high      low    close   volume
10  2019-03-01   GOOG  1124.90  1142.97  1124.75  1140.99  1450316
11  2019-03-04   GOOG  1146.99  1158.28  1130.69  1147.80  1446047
12  2019-03-05   GOOG  1150.06  1169.61  1146.19  1162.03  1443174
13  2019-03-06   GOOG  1162.49  1167.57  1155.49  1157.86  1099289
14  2019-03-07   GOOG  1155.72  1156.76  1134.91  1143.30  1166559

Each iteration on the groupby object will return two values. The first value is the identifier of the group, which is the value for the column(s) on which they were grouped. The second value is the group itself, which is a Pandas DataFrame object.

Pandas get_group method

If you want more flexibility to manipulate a single group, you can use the get_group method to retrieve a single group.

>>> aapl = symbols.get_group('AAPL')
>>> aapl
         date symbol    open    high     low   close    volume
5  2019-03-01   AAPL  174.28  175.15  172.89  174.97  25886167
6  2019-03-04   AAPL  175.69  177.75  173.97  175.85  27436203
7  2019-03-05   AAPL  175.94  176.00  174.54  175.53  19737419
8  2019-03-06   AAPL  174.67  175.49  173.94  174.52  20810384
9  2019-03-07   AAPL  173.87  174.44  172.02  172.50  24796374
>>> type(aapl)
<class 'pandas.core.frame.DataFrame'>

In the example above, we use the Pandas get_group method to retrieve all AAPL rows. To retrieve a particular group, you pass the identifier of the group into the get_group method. This method returns a Pandas DataFrame, which we can manipulate as needed.

Understanding Your Data’s Shape With Pandas Count and value_counts

If you’re working with a large DataFrame, you’ll need to use various heuristics for understanding the shape of your data. In this section, we’ll look at Pandas count and value_counts, two methods for evaluating your DataFrame.

The count method will show you the number of values for each column in your DataFrame. Using our DataFrame from above, we get the following output:

>>> df.count()
date      15
symbol    15
open      15
high      15
low       15
close     15
volume    15
dtype: int64

The output isn’t particularly helpful for us, as each of our 15 rows has a value for every column. However, this can be very useful where your data set is missing a large number of values. Using the count method can help to identify columns that are incomplete. From there, you can decide whether to exclude the columns from your processing or to provide default values where necessary.

Pandas value_counts method

For our case, value_counts method is more useful. This method will return the number of unique values for a particular column. If you have continuous variables, like our columns, you can provide an optional “bins” argument to separate the values into half-open bins.

Let’s use the Pandas value_counts method to view the shape of our “volume” column.

>>> df['volume'].value_counts(bins=4)
(1072952.085, 7683517.5]    10
(20851974.5, 27436203.0]     3
(14267746.0, 20851974.5]     2
(7683517.5, 14267746.0]      0
Name: volume, dtype: int64

In the output above, Pandas has created four separate bins for our volume column and shows us the number of rows that land in each bin.

Both counts() and value_counts() are great utilities for quickly understanding the shape of your data.

Conclusion

In this post, we learned about groupby, count, and value_counts - three of the main methods in Pandas.

Pandas is a powerful tool for manipulating data once you know the core operations and how to use it. New to Pandas or Python? Download Kite to supercharge your workflow. Kite provides line-of-code completions while you’re typing for faster development, as well as examples of how others are using the same methods.

#python #data-science #pandas

What is GEEK

Buddha Community

How to Use Pandas GroupBy, Counts, and Value Counts

Practice Problems: How To Use Pandas DataFrames' GroupBy Method

It’s now time for some practice problems! See below for details on how to proceed.

Course Repository & Practice Problems

All of the code for this course’s practice problems can be found in this GitHub repository.

There are two options that you can use to complete the practice problems:

  • Open them in your browser with a platform called Binder using this link (recommended)
  • Download the repository to your local computer and open them in a Jupyter Notebook using Anaconda (a bit more tedious)

Note that binder can take up to a minute to load the repository, so please be patient.

Within that repository, there is a folder called starter-files and a folder called finished-files. You should open the appropriate practice problems within the starter-files folder and only consult the corresponding file in the finished-files folder if you get stuck.

The repository is public, which means that you can suggest changes using a pull request later in this course if you’d like.

#pandas #groupby methods #pandas dataframe #example #practice problems: how to use pandas dataframes' groupby method #practice problems

WORKING WITH GROUPBY IN PANDAS

In my last post, I mentioned the groupby technique  in Pandas library. After creating a groupby object, it is limited to make calculations on grouped data using groupby’s own functions. For example, in the last lesson, we were able to use a few functions such as mean or sum on the object we created with groupby. But with the aggregate () method, we can use both the functions we have written and the methods used with groupby. I will show how to work with groupby in this post.

#pandas-groupby #python-pandas #pandas #data-preprocessing #pandas-tutorial

Paula  Hall

Paula Hall

1623396780

Creating Custom Aggregations to Use with Pandas groupby

I started using groupby with custom aggregations and I want to share what I learned with you.

Pandas groupby is a function you can utilize on dataframes to split the object, apply a function, and combine the results. This function is useful when you want to group large amounts of data and compute different operations for each group. If you are using an aggregation function with your groupby, this aggregation will return a single value for each group per function run. After forming your groups, you can run one or many aggregations on the grouped data.

The dataset I am using today is Amazon Top 50 Bestselling Books on Kaggle. This dataset has some nice numeric columns and categories that we can work with. Importing that dataset, we can quickly look at one example of the data using head(1) to grab the first row and .T to transpose the data. Here we can see that Genre is a great category column to groupby, and we can aggregate the user ratings, reviews, price, and year.

df = pd.read_csv("bestsellers_with_categories.csv")
print(df.head(1).T)

>>> 0
>>> Name         10-Day Green Smoothie Cleanse
>>> Author                            JJ Smith
>>> User Rating                            4.7
>>> Reviews                              17350
>>> Price                                    8
>>> Year                                  2016
>>> Genre                          Non Fiction

Now that we have taken a quick look at the columns, we can use groupby to group Genre’s data. Before applying groupby, we can see two Genre categories in this dataset, Non-Fiction, and Fiction, meaning we will have two groups of data to work with. We can play around with the groups if we wanted to consider the author or book title, but we will stick with Genre for now.

df.Genre.unique()

>>> array(['Non Fiction', 'Fiction'], dtype=object)
group_cols = ['Genre']
ex = df.groupby(group_cols)

#software-development #data #python #creating custom aggregations to use with pandas groupby #pandas groupby #custom aggregations

Makenzie  Rath

Makenzie Rath

1626148680

Group by in Pandas | How to Use Groupby | When Should I Use "Groupby" in Pandas? #7

In this video, I am trying to explain about How to Use Groupby in Pandas? (in English).Please do watch the complete video for in-depth information.

Link to our [ Hindi ] Youtube Channel: https://bit.ly/2Lyw5f9

WsCubeTech – Digital Marketing Agency & Institute.
✔ We can help you to create a comprehensive Digital Marketing plan to take your business to new heights.
✔ Offering Job Oriented Most Latest, Updated and advanced Digital Marketing Courses with Practical, Hands-on Live Projects Training & Exposure. ( Both in English & Hindi)

For More information : Call us at : +91- 92696-98122 , 8561089567

Or visit at : https://www.wscubetech.com/

Please don’t Forget to Like, Share & Subscribe

►Subscribe: http://bit.ly/wscubechannel
► Facebook : https://www.facebook.com/wscubetech.india
► Twitter : https://twitter.com/wscube
► Instagram : https://www.instagram.com/wscubetechindia/
► LinkedIn : https://www.linkedin.com/company/wscube-tech/
► Youtube : https://www.youtube.com/c/wscubetechjodhpur
► Website: http://wscubetech.com

------------------------------------------| Thanks |----------------------------
#Python #Pandas #Groupby

#python #pandas #groupby

Why Use WordPress? What Can You Do With WordPress?

Can you use WordPress for anything other than blogging? To your surprise, yes. WordPress is more than just a blogging tool, and it has helped thousands of websites and web applications to thrive. The use of WordPress powers around 40% of online projects, and today in our blog, we would visit some amazing uses of WordPress other than blogging.
What Is The Use Of WordPress?

WordPress is the most popular website platform in the world. It is the first choice of businesses that want to set a feature-rich and dynamic Content Management System. So, if you ask what WordPress is used for, the answer is – everything. It is a super-flexible, feature-rich and secure platform that offers everything to build unique websites and applications. Let’s start knowing them:

1. Multiple Websites Under A Single Installation
WordPress Multisite allows you to develop multiple sites from a single WordPress installation. You can download WordPress and start building websites you want to launch under a single server. Literally speaking, you can handle hundreds of sites from one single dashboard, which now needs applause.
It is a highly efficient platform that allows you to easily run several websites under the same login credentials. One of the best things about WordPress is the themes it has to offer. You can simply download them and plugin for various sites and save space on sites without losing their speed.

2. WordPress Social Network
WordPress can be used for high-end projects such as Social Media Network. If you don’t have the money and patience to hire a coder and invest months in building a feature-rich social media site, go for WordPress. It is one of the most amazing uses of WordPress. Its stunning CMS is unbeatable. And you can build sites as good as Facebook or Reddit etc. It can just make the process a lot easier.
To set up a social media network, you would have to download a WordPress Plugin called BuddyPress. It would allow you to connect a community page with ease and would provide all the necessary features of a community or social media. It has direct messaging, activity stream, user groups, extended profiles, and so much more. You just have to download and configure it.
If BuddyPress doesn’t meet all your needs, don’t give up on your dreams. You can try out WP Symposium or PeepSo. There are also several themes you can use to build a social network.

3. Create A Forum For Your Brand’s Community
Communities are very important for your business. They help you stay in constant connection with your users and consumers. And allow you to turn them into a loyal customer base. Meanwhile, there are many good technologies that can be used for building a community page – the good old WordPress is still the best.
It is the best community development technology. If you want to build your online community, you need to consider all the amazing features you get with WordPress. Plugins such as BB Press is an open-source, template-driven PHP/ MySQL forum software. It is very simple and doesn’t hamper the experience of the website.
Other tools such as wpFoRo and Asgaros Forum are equally good for creating a community blog. They are lightweight tools that are easy to manage and integrate with your WordPress site easily. However, there is only one tiny problem; you need to have some technical knowledge to build a WordPress Community blog page.

4. Shortcodes
Since we gave you a problem in the previous section, we would also give you a perfect solution for it. You might not know to code, but you have shortcodes. Shortcodes help you execute functions without having to code. It is an easy way to build an amazing website, add new features, customize plugins easily. They are short lines of code, and rather than memorizing multiple lines; you can have zero technical knowledge and start building a feature-rich website or application.
There are also plugins like Shortcoder, Shortcodes Ultimate, and the Basics available on WordPress that can be used, and you would not even have to remember the shortcodes.

5. Build Online Stores
If you still think about why to use WordPress, use it to build an online store. You can start selling your goods online and start selling. It is an affordable technology that helps you build a feature-rich eCommerce store with WordPress.
WooCommerce is an extension of WordPress and is one of the most used eCommerce solutions. WooCommerce holds a 28% share of the global market and is one of the best ways to set up an online store. It allows you to build user-friendly and professional online stores and has thousands of free and paid extensions. Moreover as an open-source platform, and you don’t have to pay for the license.
Apart from WooCommerce, there are Easy Digital Downloads, iThemes Exchange, Shopify eCommerce plugin, and so much more available.

6. Security Features
WordPress takes security very seriously. It offers tons of external solutions that help you in safeguarding your WordPress site. While there is no way to ensure 100% security, it provides regular updates with security patches and provides several plugins to help with backups, two-factor authorization, and more.
By choosing hosting providers like WP Engine, you can improve the security of the website. It helps in threat detection, manage patching and updates, and internal security audits for the customers, and so much more.

Read More

#use of wordpress #use wordpress for business website #use wordpress for website #what is use of wordpress #why use wordpress #why use wordpress to build a website