How to Create DataFrame in Python

How to Create DataFrame in Python

In this article, we will learn about DataFrames in Python and how we can create it.

Python provides data structures like Series, DataFrame, Pandas. In this article, we are going to read about DataFrames.

As we know, Python also supports Data Structure. For new and beginners, let’s first discuss what Data Structure is. A data structure is basically a way of storing data in such a way that it can be easily accessed and worked with, like,

  • Storing data in a way so that we can quickly access the last item, we create a STACK  (LIFO).
  • Storing data in a way so that we can quickly access the first item, we create a QUEUE (FIFO).

In a similar fashion, Python also provides some data structures.

  • Series
  • Data Frame
  • Pandas

In this article, we will know about DataFrame.

Data Frame
  • It is a 2-Dimensional labeled array, which stores ordered collection columns that can store data of different types.
  • It has two indexes or we can say two axes - a row index and a column index.
  • Data Frame is “Value-Mutable” and “Size-Mutable”, i.e., we can change the value as well as the size.

Let’s start with creating DataFrames.

There are a lot of ways for creating DataFrames, like,

1. Creating DataFrame from Series Object

Series is a Pandas data structure that represents a 1-Dimenisonal Array like object containing an array of data and an associated array of data labels, called index. We can also create DataFrame from it. Like,

import pandas as pd  
Students = pd.Series([‘Raj’,’Raman’,’Rahul’], index=[1,2,3])  
Marks=pd.Series([75,89,90], index=[1,2,3])  
Contact=pd.Series([‘9899’,’9560’,’9871’], index=[1,2,3])  
  
Dict={ Stud:Students, MM:Marks, Phone:Contact}  
  
df=pd.DataFrame(dict)  
print(df)  

Note Index must be same for all Series.

2. Creating DataFrame from 2-D Dictionary

Creating DataFrame using 2-D Dictionary contains values as a list.

import pandas as pd  
  
dictObj={   
   ‘EmpCode’ : [‘E01’,’E02’,’E03’,’E04’],  
                  ‘EmpName’ : [‘Raj’,’Raman’,’Rahul’,’Rohit’],  
                  ‘EmpDept’ : [‘HR’,’Accounts’,’IT’,’HR’]  
                }  
  
df=pd.DataFrame(dictObj)  
print(df)  

The output of the above code is mentioned below.

As we can see in the output, it generates the index and keys of 2-D dictionary (which become columns).

We can also change the index value by passing the index in DataFrame(), like

df=pd.DataFrame(dictObj, index=[‘I’,’II’,’III’,’IV’])  

Note Index value must be the same length of rows, otherwise, it generates an error.

Creating DataFrame using 2-D Dictionary contains values as Dictionary or Nested Dictionary,

import pandas as pd  
yr2018 = {‘NoOfArticles’:1200, ‘NoOfBlogs’:1000, ‘NoOfNews’:700}  
yr2019 = {‘NoOfArticles’:1500, ‘NoOfBlogs’:1500, ‘NoOfNews’:900}  
yr2020 = {‘NoOfArticles’:2000, ‘NoOfBlogs’:1800, ‘NoOfNews’:1000}  
  
Published = {2018:yr2018, 2019:yr2019, 2020:yr2020}  
df = pd.DataFrame(Published)  
print(df)  

In the above line of code, first, we created 3 dictionaries - yr2018, yr2019 and yr2020. After that, we created a “Published” dictionary which contains other dictionaries. We can also create the above dictionary like below.

Published = {  
    2018 = {‘NoOfArticles’:1200, ‘NoOfBlogs’:1000, ‘NoOfNews’:700},  
    2019 = {‘NoOfArticles’:1500, ‘NoOfBlogs’:1500, ‘NoOfNews’:900},  
    2020 = {‘NoOfArticles’:2000, ‘NoOfBlogs’:1800, ‘NoOfNews’:1000}  
}  
df = pd.DataFrame(Published)  
print(df)  

While creating a DataFrame with 2-D nested dictionary -

Columns : outer dictionary keys

Rows : inner dictionary keys.

See the output,

3. Creating DataFrame from 2-D ndarray (Numpy Array)
import numpy as np  
import pandas as pd  
  
arr=([[11,12,13],[14,15,16],[17,18,19],[20,21,22]])  
df=pd.DataFrame(arr)  
print(df)  

As we can see, the output that it automatically gives row indexes and column indexes which started from 0. We can also change column name and row name like,

df=pd.DataFrame(arr,columns=[‘One’,’Two’,’Three’], index=[‘I’,’II’,’III’,’IV’])  

See the output after executing above command,

Note If number of elements in each row different, then Python will create just single column in the dataframe object and the type of column will be consider as Object, like,

import numpy as np  
import pandas as pd  
arr=np.array([[2,3],[7,8,9],[3,6,5]])  
df=pd.DataFrame(arr)  
print(df)  

4. Creating DataFrame from another DataFrame

We can also create a new DataFrame by existing DataFrame. Like

df2=pd.DataFrame(df)  
  
print(df2)  
Conclusion

Now, we have learned about DataFrames in python and how we can create it. After reading this article, I hope we are able to create DataFrame in python.

All the queries related to this article and sample files are always welcome. Thanks for reading.!!!

Top Python Development Companies | Hire Python Developers

Top Python Development Companies | Hire Python Developers

After analyzing clients and market requirements, TopDevelopers has come up with the list of the best Python service providers. These top-rated Python developers are widely appreciated for their professionalism in handling diverse projects. When...

After analyzing clients and market requirements, TopDevelopers has come up with the list of the best Python service providers. These top-rated Python developers are widely appreciated for their professionalism in handling diverse projects. When you look for the developer in hurry you may forget to take note of review and ratings of the company's aspects, but we at TopDevelopers have done a clear analysis of these top reviewed Python development companies listed here and have picked the best ones for you.

List of Best Python Web Development Companies & Expert Python Programmers.

DataFrames in Python

DataFrames in Python

A DataFrame is a two-dimensional data container, similar to a Matrix, but which can contain heterogeneous data, and for which symbolic names may be associated with the rows and columns. In this post, we learn about DataFrames and how it works

A DataFrame is a two-dimensional data container, similar to a Matrix, but which can contain heterogeneous data, and for which symbolic names may be associated with the rows and columns. In this post, we learn about DataFrames and how it works

Dataframes are going to be the main tool when working with pandas.

Prerequisites

Python Pandas should be installed in the system, else, you can install it using,

pip install pandas  

(If you have installed python directly by going herenloadshttps://www.python.org/downloads/))

OR

conda install pandas  

(if you have Anaconda distribution of python)

DataFrames and how they interact with pandas

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic.

import pandas as pd   
import numpy as np 

from numpy.random import randn  
np.random.seed(101)  

To generate some random numbers, we use seed here.

Let’s create a dataframe now,

df = pd.Dataframe  

If you are using jupyter notebook, press shift+tab after df = pd.Dataframe, and you will see this,

Check out the docstring and the initial signature for this PD dataframe. We have a data argument, index argument just like Series but then we have this additional Columns argument.

Let's go ahead and create it with some random data and we'll see what a dataframe actually looks like. For data argument, we are using randn(5,4) ; for index argument, we are using a list of characters and for columns argument, we are using another list of characters.

df = pd.DataFrame(randn(5,4),['A','B','C','D','E'],['W','X','Y','Z']) 
df

So, basically what we have here is a list of columns w x y z and corresponding rows A B C D E. Each of these columns is actually a panda series such as W or X or Y or Z and they all share a common index. Data frame is a bunch of series that share an index.

Selection and Indexing

Let’s grab data from a DataFrame.

Selecting columns

df['W']  

You can check the type using,

type(df['W'])  

which will give pandas.core.series.Series result.

You can also check,

type(df)  

which will give pandas.core.frame.DataFrame result

If you want to select multiple columns,

df[['W','Z']]  

Creation of New Columns

For creating a new column from the summation of already existing columns, use,

df['new'] = df['W'] + df['Y'] 

Removing Columns

For removing columns, you can just do,

df.drop('new',axis=1)  

Here, you can use shift + tab to check what axis actually refers to. Axis = 0, which is by default is for rows, whereas, Axis = 1 refers to columns. So, here we use axis=1 because we wanted to drop a column.

Note: ‘new’ column still exists, because pandas has this special property, you have to use ‘inplace’ argument to retain this change. The reason pandas does that is because it does not want you to accidentally lose information. So, use inplace=True.

We can also use df.drop('E',axis=0) to drop a row. Try it yourself.

df.drop('E',axis=0)  
A Quick Question: Why are the rows 0 and why are the columns 1?

The reference actually comes back to numpy. Data frames are essentially index markers on top of a numpy array. Use df.shape() which results a tuple (5, 4). For a two-dimensional matrix, at the 0 index are the number of rows (A,B,C,D,E) and then on the index 1 are columns (W,X,Y,Z); which is why rows are referred to as the 0 axis and columns are referred to as 1 axis because it's directly taken from the shape same as numpy array.

Selecting rows

There are two ways to select rows in a data frame and you have to call a dataframe method for this.

Select based on label

df.loc['A'] 

OR

Select based on the position

df.iloc[2]    

Note Not only are all the columns series but the rows are series as well.

Selecting subsets of rows and columns

For this use,

df.loc[['A','B'],['W','Y']] 

For selecting a particular value, use,

df.loc['B','Y']  

Conditional Selection

A very important feature of pandas is the ability to perform conditional selection using bracket notation and this is going to be very similar to numpy.

Let’s use comparison operator,

df > 0  

Result is a dataframe with boolean values, which returns true if the data frame value at that position is greater than zero and false if it is not greater than zero. See below,

df[df>0]  

As you can see wherever the value is negative, not satisfying the condition, a NaN has been returned.

Now, what is important is, instead of returning NaN we will return only the rows or columns of a subset of the data frame where the conditions are true.

Let's say we want to grab the data frame where the column value is W>0 and we want to extract Y column. We can also select a set of columns such as Y and X, after applying the condition. See below,

Using multiple conditions

For more than one condition, we can use | or &. Remember that we cannot use python’s and/or here.

df[(df['W']>0) & (df['Y'] > 1)]  

Resetting the index

In order to reset the index back to the default which is 1234....n, we use the method reset_index(). We will get the index, reset to a column and the actual index converted to a numerical. But it will not retain the change if you don’t use inplace=True. Pandas use this inplace argument in many areas, just shift+tab(if using jupyter notebook) and you will get to see it.

df.reset_index()  

Setting a new index

For setting a new index, first, we have to create a new index. We are using the split() method of a string, which is just a common method for splitting off a blank space. It’s a quick way to create a list,

newind = 'WB MP KA TN UP'.split()  

Now, put this list as a column of the dataframe.

df['States'] = newind  
df 

If we want to use this State column as the index, we should use,

df.set_index('States')  

Note Unless we retain this information of the index it will overwrite the old index and we won't actually be able to retain this information as a new column. Unlike resets index that allows us to have that new column.

So, that's set index versus reset index.

Here also, inplace=True plays an important role.

Hope, you have enjoyed reading about DataFrames thus far. There's more to come in an upcoming article on DataFrames with something more interesting.

Thank for reading!

Python GUI Programming Projects using Tkinter and Python 3

Python GUI Programming Projects using Tkinter and Python 3

Python GUI Programming Projects using Tkinter and Python 3

Description
Learn Hands-On Python Programming By Creating Projects, GUIs and Graphics

Python is a dynamic modern object -oriented programming language
It is easy to learn and can be used to do a lot of things both big and small
Python is what is referred to as a high level language
Python is used in the industry for things like embedded software, web development, desktop applications, and even mobile apps!
SQL-Lite allows your applications to become even more powerful by storing, retrieving, and filtering through large data sets easily
If you want to learn to code, Python GUIs are the best way to start!

I designed this programming course to be easily understood by absolute beginners and young people. We start with basic Python programming concepts. Reinforce the same by developing Project and GUIs.

Why Python?

The Python coding language integrates well with other platforms – and runs on virtually all modern devices. If you’re new to coding, you can easily learn the basics in this fast and powerful coding environment. If you have experience with other computer languages, you’ll find Python simple and straightforward. This OSI-approved open-source language allows free use and distribution – even commercial distribution.

When and how do I start a career as a Python programmer?

In an independent third party survey, it has been revealed that the Python programming language is currently the most popular language for data scientists worldwide. This claim is substantiated by the Institute of Electrical and Electronic Engineers, which tracks programming languages by popularity. According to them, Python is the second most popular programming language this year for development on the web after Java.

Python Job Profiles
Software Engineer
Research Analyst
Data Analyst
Data Scientist
Software Developer
Python Salary

The median total pay for Python jobs in California, United States is $74,410, for a professional with one year of experience
Below are graphs depicting average Python salary by city
The first chart depicts average salary for a Python professional with one year of experience and the second chart depicts the average salaries by years of experience
Who Uses Python?

This course gives you a solid set of skills in one of today’s top programming languages. Today’s biggest companies (and smartest startups) use Python, including Google, Facebook, Instagram, Amazon, IBM, and NASA. Python is increasingly being used for scientific computations and data analysis
Take this course today and learn the skills you need to rub shoulders with today’s tech industry giants. Have fun, create and control intriguing and interactive Python GUIs, and enjoy a bright future! Best of Luck
Who is the target audience?

Anyone who wants to learn to code
For Complete Programming Beginners
For People New to Python
This course was designed for students with little to no programming experience
People interested in building Projects
Anyone looking to start with Python GUI development
Basic knowledge
Access to a computer
Download Python (FREE)
Should have an interest in programming
Interest in learning Python programming
Install Python 3.6 on your computer
What will you learn
Build Python Graphical User Interfaces(GUI) with Tkinter
Be able to use the in-built Python modules for their own projects
Use programming fundamentals to build a calculator
Use advanced Python concepts to code
Build Your GUI in Python programming
Use programming fundamentals to build a Project
Signup Login & Registration Programs
Quizzes
Assignments
Job Interview Preparation Questions
& Much More