Learn the basics of Python, Numpy, Pandas, Data Visualization, and Exploratory Data Analysis in this course for beginners. This was originally presented as a live course.
By the end of the course, you will be able to build an end-to-end real-world course project and earn a verified certificate of accomplishment. There are no prerequisites for this course.
Learn more and register for a certificate of accomplishment here: http://zerotopandas.com
This full course video includes 6 lectures (all in this video):
• Introduction to Programming with Python
• Next Steps with Python
• Numerical Computing with Numpy
• Analyzing Tabular Data with Pandas
• Visualization with Matplotlib and Seaborn
• Exploratory Data Analysis - A Case Study
💻 Code References
• First steps with Python: https://jovian.ai/aakashns/first-steps-with-python
• Variables and data types: https://jovian.ai/aakashns/python-variables-and-data-types
• Conditional statements and loops: https://jovian.ai/aakashns/python-branching-and-loops
• Functions and scope: https://jovian.ai/aakashns/python-functions-and-scope
• Working with OS & files: https://jovian.ai/aakashns/python-os-and-filesystem
• Numerical computing with Numpy: https://jovian.ai/aakashns/python-numerical-computing-with-numpy
• 100 Numpy exercises: https://jovian.ai/aakashns/100-numpy-exercises
• Analyzing tabular data with Pandas: https://jovian.ai/aakashns/python-pandas-data-analysis
• Matplotlib & Seaborn tutorial: https://jovian.ai/aakashns/python-matplotlib-data-visualization
• Data visualization cheat sheet: https://jovian.ai/aakashns/dataviz-cheatsheet
• EDA on StackOverflow Developer Survey: https://jovian.ai/aakashns/python-eda-stackoverflow-survey
• Opendatasets python package: https://github.com/JovianML/opendatasets
• EDA starter notebook: https://jovian.ai/aakashns/zerotopandas-course-project-starter
⭐️ Course Contents ⭐️
0:00:00 Course Introduction
#python #numpy #data-analysis #pandas #developer
Using data to inform decisions is essential to product management, or anything really. And thankfully, we aren’t short of it. Any online application generates an abundance of data and it’s up to us to collect it and then make sense of it.
Google Data Studio helps us understand the meaning behind data, enabling us to build beautiful visualizations and dashboards that transform data into stories. If it wasn’t already, data literacy is as much a fundamental skill as learning to read or write. Or it certainly will be.
Nothing is more powerful than data democracy, where anyone in your organization can regularly make decisions informed with data. As part of enabling this, we need to be able to visualize data in a way that brings it to life and makes it more accessible. I’ve recently been learning how to do this and wanted to share some of the cool ways you can do this in Google Data Studio.
#google-data-studio #blending-data #dashboard #data-visualization #creating-visualizations #how-to-visualize-data #data-analysis #data-visualisation
Many a time, I have seen beginners in data science skip exploratory data analysis (EDA) and jump straight into building a hypothesis function or model. In my opinion, this should not be the case. We should first perform an EDA as it will connect us with the dataset at an emotional level and yes, of course, will help in building good hypothesis function.
EDA is a very crucial step. It gives us a glimpse of what our data set is all about, its uniqueness, its anomalies and finally it summarizes the main characteristics of the dataset for us. In this post, I will share a very basic guide for performing EDA.
**Step 1: Import your data set **and have a good look at the data.
In order to perform EDA, we will require the following python packages.
Packages to import:
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from collections import defaultdict %matplotlib inline view raw exploratory_analysis1.py hosted with ❤ by GitHub
Once we have imported the packages successfully, we will move on to importing our dataset. You must be aware of read_csv() tool from pandas for reading csv files.
Import the dataset:
For the purpose of this tutorial, I have used Loan Prediction dataset from Analytics Vidhya. If you wish to code along, here is the link.
The dataset has been successfully imported. Let’s have a look at the Train dataset.
Fig 1 : Overview of Train dataset
#data-science #python #pandas #data-analysis #data-visualization #data analysis
Pandas is one of the predominant data analysis tools which is highly appreciated among data scientists. It provides numerous flexible and versatile functions to perform efficient data analysis.
In this article, we will go over 3 pandas tricks that I think will make you a more happy pandas user. It is better to explain these tricks with some examples. Thus, we start by creating a data frame to wok on.
The data frame contains daily sales quantities of 3 different stores. We first create a period of 10 days using the
date_range function of pandas.
import numpy as np import pandas as pd days = pd.date_range("2020-01-01", periods=10, freq="D")
The days variable will be used as a column. We also need a sales quantity column which can be generated by the
randint function of numpy. Then, we create a data frame with 3 columns for each store.
#machine-learning #data-science #python #python pandas tricks #efficient data analysis #python pandas tricks for efficient data analysis
Oftentimes, we tend to forget that the pandas library is built on top of the numpy package. In this comprehensive guide, we take full advantage of the fact that all numpy functionalities are also available in pandas.
Incorporating the necessary packages
To be able to make full use of the power of both pandas and numpy, we must import the necessary packages. As is the well-known convention, we rename them appropriately:
pandas renamed as pd; numpy renamed as np
In case we do not have these packages installed, we can do so though the terminal by typing the following command(s):
pip install pandas # try pip3 if necessary pip install numpy # try pip3 if necessary
Once the packages have been imported and renamed, we have to use pd (for pandas) and np (for numpy). Otherwise, errors show up.
Creating DataFrame object
A DataFrame can be created from a list, a dictionary or even a numpy array. We populate a numpy array with random integers and build a DataFrame object out of it:
5 x 3 numpy array filled with random integers
Using the randint( ) function from the random module of numpy, we managed to create a numpy array having 5 rows and 3 columns. The shape is passed in the form of a tuple as a third argument to randint( ). The first and second arguments to randint( ) denote lower bound and upper bound respectively of the range of numbers using which we create our array. Random numbers are generated between 10 to (50–1) because it is exclusive of the upper bound. We now pass the array as an argument to DataFrame( ), resulting in the creation of a DataFrame object:
df is the DataFrame object
To display the content of df, which is nothing but a numpy array, we call upon the values attribute of DataFrame:
Invoking values attribute on df returns the numpy array
The row headers (0, 1, 2, 3, 4) are auto-generated and are in the form of a sequence; so are the column headers (0, 1, 2). To get the row headers, which in this case is an auto-generated sequence, we use the index attribute:
Valid row headers range from 0 to 4 with a step size of 1
To fetch column headers, which also is an auto-generated sequence, we use the columns attribute:
Valid column headers range from 0 to 2 with a step size of 1
Mind you, a sequence goes up to but not including the stop value. Therefore, for both row and column sequences, the stop parameter is 1 more than the last value.
Content of df
Since the DataFrame object is a numpy array, we can index and/or slice it in the same way we would index and/or slice a numpy array. The general form is:
df.values[row_index, column_index] # indexing df.values[row_start:row_stop, col_start:col_stop] # slicing
Display all columns of second row
Display all columns of last row (row index = 4). A single value within [ ], like the one shown above, denotes all columns of the row index passed inside
Display all rows of second column (column index = 1)
All the slices that we see above are numpy arrays:
The type( ) function confirms our claim
We can also access a particular element of the DataFrame:
Specifying the row index as well as the column index gives the element at their intersecting point
Assigning manual row headers and column headers
Creating a DataFrame object from a numpy array built using random integers between 10 to 50
The row and column headers are auto-generated. We can come up with our own headers as well:
Row labels range from R1 to R5. Column labels range from C1 to C3
Explicit indexing works on DataFrame objects
#numpy #data-analysis #pandas #data-science #python #data analysis
If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.
If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.
In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.
#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition