Pandas DataFrame: A Complete Introduction for Beginners

Pandas DataFrame: A Complete Introduction for Beginners

In this article, you'll learn the basics of Pandas DataFrame from scratch.

Before getting started let me introduce you about

Pandas, Pandas is a python library which provides high-performance, easy-to-use data structures such as a series, Data Frame and Panel for data analysis tools for Python programming language. Moreover, Pandas Data Frame consists of main components, the data, rows, and columns. To use the pandas library and its data structures, all you have to do it to install it and import it. See the documentation of the Pandas library for a better understanding and installing guidance.

Basic operations that can be applied on a pandas Data Frame are as shown below:

  1. Creating a Data Frame.
  2. Performing operations on Rows and Columns.
  3. Data Selection, addition, deletion.
  4. Working with missing data.
  5. Renaming the Columns or Indices of a DataFrame.
1. Creating a Data Frame.

The pandas data frame can be created by loading the data from the external, existing storage like a database, SQL or CSV files. But the pandas Data Frame can also be created from the lists, dictionary, etc. One of the ways to create a pandas data frame is shown below:

# import the pandas library
import pandas as pd
# Dictionary of key pair values called data
data = {'Name':['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh'],
        'Age': [24, 23, 22, 19, 10]}
data

{'Age': [24, 23, 22, 19, 10],  'Name': ['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh']}
# Calling the pandas data frame method by passing the dictionary (data) as a parameter
df = pd.DataFrame(data)
df

2. Performing operations on Rows and Columns.

Data Frame is a two-dimensional data structure, data is stored in rows and columns. Below we can perform some operations on Rows and Columns.**Selecting a Column: **

In order to select a particular column, all we can do is just call the name of the column inside the data frame.

# import the pandas library
import pandas as pd
# Dictionary of key pair values called data
data = {'Name':['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh'],
        'Age': [24, 23, 22, 19, 10]}
data

{'Age': [24, 23, 22, 19, 10],  'Name': ['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh']}
# Calling the pandas data frame method by passing the dictionary (data) as a parameter
df = pd.DataFrame(data)
# Selecting column
df[['Name']]

**Selecting a Row: **

Pandas Data Frame provides a method called “loc” which is used to retrieve rows from the data frame. Also, rows can also be selected by using the “iloc” as a function.

# Calling the pandas data frame method by passing the dictionary (data) as a parameter
df = pd.DataFrame(data)
# Selecting a row
row = df.loc[1]
row

Name    Tanu 
Age       23 
Name: 1, dtype: object

To select a particular column, all we can do is just call the name of the column inside the data frame. As seen above to work with the “loc” method you have to pass the index of the data frame as a parameter. The loc method accepts only integers as a parameter. So in the above example, I wanted to access “Tanu” row, so I passed the index as 1 as a parameter. Now there's a quick assignment for you guys, use the “iloc” method and tell me the result.

3. Data Selection, addition, deletion.

You can treat a DataFrame semantically like a dictionary of like-indexed Series objects. Getting, setting, and deleting columns works with the same syntax as the analogous dictionary operations:

# import the pandas library
import pandas as pd
# Dictionary of key pair values called data
data = {'Name':['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh'],
        'Age': [24, 23, 22, 19, 10]}

# Calling the pandas data frame method by passing the dictionary (data) as a parameter
df = pd.DataFrame(data)
# Selecting the data from the column
df['Age']
0    24 
1    23 
2    22 
3    19 
4    10 
Name: Age, dtype: int64

Columns can be deleted like with a dictionary just use the del operation.

del df[‘Age’]
df

Data can be added by using the insert function. The insert function is available to insert at a particular location in the columns:

df.insert(1, ‘name’, df[‘Name’])
df

4. Working with missing data.

Missing data occur a lot of times when we are accessing big data sets. It occurs often like NaN (Not a number). In order to fill those values, we can use “isnull()” method. This method checks whether a null value is present in a data frame or not.Checking for the missing values.

# importing both pandas and numpy libraries
import pandas as pd
import numpy as np

# Dictionary of key pair values called data
data ={‘First name’:[‘Tanu’, np.nan],
 ‘Age’: [23, np.nan]}
df = pd.DataFrame(data)
df

# using the isnull() function
df.isnull()

The isnull () returns false if the null is not present and true for null values. Now we have found the missing values, the next task is to fill those values with 0 this can be done as shown below:

df.fillna(0)

5. Renaming the Columns or Indices of a DataFrame.

To give the columns or the index values of your data frame a different value, it’s best to use the .rename() method. Purposefully I have changed the column name to give a better insight.

# import the pandas library
import pandas as pd
# Dictionary of key pair values called data
data = {‘NAMe’:[‘Ashika’, ‘Tanu’, ‘Ashwin’, ‘Mohit’, ‘Sourabh’],
 ‘AGe’: [24, 23, 22, 19, 10]}

# Calling the pandas data frame method by passing the dictionary (data) as a parameter
df = pd.DataFrame(data)
df

newcols = {
 ‘NAMe’: ‘Name’,
 ‘AGe’: ‘Age’
 }
# Use `rename()` to rename your columns
df.rename(columns=newcols, inplace=True)
df

# The values of new index
newindex = {
 0: ‘a’,
 1: ‘b’,
 2: ‘c’,
 3: ‘d’,
 4: ‘e’
}
# Rename your index
df.rename(index=newindex)

Hence above are the very important techniques or methods of pandas data frame in Python. If you guys have some doubts in the code, the comment section is all yours.Thank you.

Python For Data Analysis | Build a Data Analysis Library from Scratch | Learn Python in 2019

Python For Data Analysis - Build a Data Analysis Library from Scratch - Learn Python in 2019

**
**

Immerse yourself in a long, comprehensive project that teaches advanced Python concepts to build an entire library

You’ll learn

  • How to build a Python library similar pandas
  • How to complete a large, comprehensive project
  • Test-driven development with pytest
  • Environment creation
  • Advanced Python topics such as special methods and property decorators
  • A fully-functioning library that you can use to data analysis

Python Pandas Tutorial - Learn Data Science from Scratch

Python Pandas Tutorial - Learn Data Science from Scratch

Complete Python Pandas Data Science Tutorial: Reading CSV/Excel files, Sorting, Filtering, Groupby. In this tutorial we walk through many of the fundamental concepts to use the Python Pandas Data Science Library. We start off by installing pandas and loading in an example csv. We then look at different ways to read the data. Read a column, rows, specific cell, etc.

Complete Python Pandas Data Science Tutorial: Reading CSV/Excel files, Sorting, Filtering, Groupby.

In this video we walk through many of the fundamental concepts to use the Python Pandas Data Science Library. We start off by installing pandas and loading in an example csv. We then look at different ways to read the data. Read a column, rows, specific cell, etc. Also ways to read data based on conditioning. We then move into some more advanced ways to sort & filter data. We look at making conditional changes to our data. We also start doing aggregate stats using the groupby function. We finished the video talking about how you would work with a very large dataset (many gigabytes)

Data used in this Tutorial: https://github.com/KeithGalli/pandas
Python Pandas Documentation: https://pandas.pydata.org/pandas-docs/stable/

Thanks for watching friends! Happy coding! :)

Tutorial on Data Analysis With Python and Pivot Tables With Pandas

Tutorial on Data Analysis With Python and Pivot Tables With Pandas

We’ll learn how to do data analysis with Python and make pivot tables with Pandas.

We’ll learn how to do data analysis with Python and make pivot tables with Pandas.

One of the first posts on my blog was about Pivot tables. I’d created a library to pivot tables in my PHP scripts. The library is not very beautiful (it throws a lot of warnings), but it works. These days I’m playing with Python Data Analysis and I’m using Pandas. The purpose of this post is something that I like a lot: learn by doing. So I want to do the same operations that I did eight years ago in the post but now with Pandas. Let’s start.

I’ll start with the same data source that I used almost ten years ago. One simple set of records, with clicks and number of users

I create a dataframe with this data

import numpy as np
import pandas as pd

data = pd.DataFrame([
    {'host': 1, 'country': 'fr', 'year': 2010, 'month': 1, 'clicks': 123, 'users': 4},
    {'host': 1, 'country': 'fr', 'year': 2010, 'month': 2, 'clicks': 134, 'users': 5},
    {'host': 1, 'country': 'fr', 'year': 2010, 'month': 3, 'clicks': 341, 'users': 2},
    {'host': 1, 'country': 'es', 'year': 2010, 'month': 1, 'clicks': 113, 'users': 4},
    {'host': 1, 'country': 'es', 'year': 2010, 'month': 2, 'clicks': 234, 'users': 5},
    {'host': 1, 'country': 'es', 'year': 2010, 'month': 3, 'clicks': 421, 'users': 2},
    {'host': 1, 'country': 'es', 'year': 2010, 'month': 4, 'clicks': 22, 'users': 3},
    {'host': 2, 'country': 'es', 'year': 2010, 'month': 1, 'clicks': 111, 'users': 2},
    {'host': 2, 'country': 'es', 'year': 2010, 'month': 2, 'clicks': 2, 'users': 4},
    {'host': 3, 'country': 'es', 'year': 2010, 'month': 3, 'clicks': 34, 'users': 2},
    {'host': 3, 'country': 'es', 'year': 2010, 'month': 4, 'clicks': 1, 'users': 1}
])

|

| |

Now we want to do a simple pivot operation. We want to pivot on the host:

pd.pivot_table(data,
   index=['host'],
   values=['users', 'clicks'],
   columns=['year', 'month'],
   fill_value=''
  )

We can add totals:

pd.pivot_table(data,
               index=['host'],
               values=['users', 'clicks'],
               columns=['year', 'month'],
               fill_value='',
               aggfunc=np.sum,
               margins=True,
               margins_name='Total'
              )

|

| |

We can also pivot on more than one column. For example, host and country

pd.pivot_table(data,
               index=['host', 'country'],
               values=['users', 'clicks'],
               columns=['year', 'month'],
               fill_value=''
              )

and also with totals

pd.pivot_table(data,
               index=['host', 'country'],
               values=['users', 'clicks'],
               columns=['year', 'month'],
               aggfunc=np.sum,
               fill_value='',
               margins=True,
               margins_name='Total'
              )

We can group by dataframe and calculate subtotals:

data.groupby(['host', 'country'])[('clicks', 'users')].sum()

data.groupby(['host', 'country'])[('clicks', 'users')].mean()

And, finally, we can mix totals and subtotals.

out = data.groupby('host').apply(lambda sub: sub.pivot_table(
    index=['host', 'country'],
    values=['users', 'clicks'],
    columns=['year', 'month'],
    aggfunc=np.sum,
    margins=True,
    margins_name='SubTotal',
))

out.loc[('', 'Max', '')] = out.max()
out.loc[('', 'Min', '')] = out.min()
out.loc[('', 'Total', '')] = out.sum()

out.index = out.index.droplevel(0)

out.fillna('', inplace=True)

|

|

|

And that’s all! I’ve got a lot to learn yet about data analysis, but Pandas will definitely be a good friend of mine.

You can see the Jupiter notebook on my GitHub account.

Thanks for reading ❤