Code  Camp

Code Camp


Data Analysis with Python Course - Numpy, Pandas, Data Visualization

Learn the basics of Python, Numpy, Pandas, Data Visualization, and Exploratory Data Analysis in this course for beginners. This was originally presented as a live course.

By the end of the course, you will be able to build an end-to-end real-world course project and earn a verified certificate of accomplishment. There are no prerequisites for this course.

Learn more and register for a certificate of accomplishment here:

This full course video includes 6 lectures (all in this video):
• Introduction to Programming with Python
• Next Steps with Python
• Numerical Computing with Numpy
• Analyzing Tabular Data with Pandas
• Visualization with Matplotlib and Seaborn
• Exploratory Data Analysis - A Case Study

💻 Code References
• First steps with Python:
• Variables and data types:
• Conditional statements and loops:
• Functions and scope:
• Working with OS & files:
• Numerical computing with Numpy:
• 100 Numpy exercises:
• Analyzing tabular data with Pandas:
• Matplotlib & Seaborn tutorial:
• Data visualization cheat sheet:
• EDA on StackOverflow Developer Survey:
• Opendatasets python package:
• EDA starter notebook:

⭐️ Course Contents ⭐️
0:00:00 Course Introduction

Lecture 1

  • 0:01:42 Python Programming Fundamentals
  • 0:02:40 Course Curriculum
  • 0:05:24 Notebook - First Steps with Python and Jupyter
  • 0:08:30 Performing Arithmetic Operations with Python
  • 0:11:34 Solving Multi-step problems using variables
  • 0:20:17 Combining conditions with Logical operators
  • 0:22:22 Adding text using Markdown
  • 0:23:50 Saving and Uploading to Jovian
  • 0:26:38 Variables and Datatypes in Python
  • 0:31:28 Built-in Data types in Python
  • 1:07:19 Further Reading

Lecture 2

  • 1:08:46 Branching Loops and Functions
  • 1:09:02 Notebook - Branching using conditional statements and loops in Python
  • 1:09:24 Branching with if, else, elif
  • 1:15:25 Non Boolean conditions
  • 1:19:00 Iteration with while loops
  • 1:28:57 Iteration with for loops
  • 1:36:27 Functions and scope in Python
  • 1:36:53 Creating and using functions
  • 1:42:24 Writing great functions in Python
  • 1:45:38 Local variables and scope
  • 2:08:19 Documentation functions using Docstrings
  • 2:11:40 Exercise - Data Analysis for Vacation Planning

Lecture 3

  • 2:17:17 Numercial Computing with Numpy
  • 2:18:00 Notebook - Numerical Computing with Numpy
  • 2:26:09 From Python Lists to Numpy Arrays
  • 2:29:09 Operating on Numpy Arrays
  • 2:34:33 Multidimensional Numpy Arrays
  • 3:03:41 Array Indexing and Slicing
  • 3:17:49 Exercises and Further Reading
  • 3:20:50 Assignment 2 - Numpy Array Operations
  • 3:29:16 100 Numpy Exercises
  • 3:31:25 Reading from and Writing to Files using Python

Lecture 4

  • 4:02:59 Analysing Tabular Data with Pandas
  • 4:03:58 Notebook - Analyzing Tabular Data with Pandas
  • 4:16:33 Retrieving Data from a Data Frame
  • 4:32:00 Analyzing Data from Data Frames
  • 4:36:27 Querying and Sorting Rows
  • 5:01:45 Grouping and Aggregation
  • 5:11:26 Merging Data from Multiple Sources
  • 5:26:00 Basic Plotting with Pandas
  • 5:38:27 Assignment 3 - Pandas Practice

Lecture 5

  • 5:52:48 Visualization with Matplotlib and Seaborn
  • 5:54:04 Notebook - Data Visualization with Matplotlib and Seaborn
  • 6:06:43 Line Charts
  • 6:11:27 Improving Default Styles with Seaborn
  • 6:16:51 Scatter Plots
  • 6:28:14 Histogram
  • 6:38:47 Bar Chart
  • 6:50:00 Heatmap
  • 6:57:08 Displaying Images with Matplotlib
  • 7:03:37 Plotting multiple charts in a grid
  • 7:15:42 References and further reading
  • 7:20:17 Course Project - Exploratory Data Analysis

Lecture 6

  • 7:49:56 Exploratory Data Analysis - A Case Study
  • 7:50:55 Notebook - Exploratory Data Analysis - A case Study
  • 8:04:36 Data Preparation and Cleaning
  • 8:19:37 Exploratory Analysis and Visualization
  • 8:54:02 Asking and Answering Questions
  • 9:22:57 Inferences and Conclusions
  • 9:25:00 References and Future Work
  • 9:29:41 Setting up and running Locally
  • 9:34:21 Project Guidelines
  • 9:45:00 Course Recap
  • 9:48:01 What to do next?
  • 9:49:10 Certificate of Accomplishment
  • 9:50:11 What to do after this course?
  • 9:52:16 Jovian Platform

#python #numpy #data-analysis #pandas #developer

What is GEEK

Buddha Community

Data Analysis with Python Course - Numpy, Pandas, Data Visualization
Sid  Schuppe

Sid Schuppe


How To Blend Data in Google Data Studio For Better Data Analysis

Using data to inform decisions is essential to product management, or anything really. And thankfully, we aren’t short of it. Any online application generates an abundance of data and it’s up to us to collect it and then make sense of it.

Google Data Studio helps us understand the meaning behind data, enabling us to build beautiful visualizations and dashboards that transform data into stories. If it wasn’t already, data literacy is as much a fundamental skill as learning to read or write. Or it certainly will be.

Nothing is more powerful than data democracy, where anyone in your organization can regularly make decisions informed with data. As part of enabling this, we need to be able to visualize data in a way that brings it to life and makes it more accessible. I’ve recently been learning how to do this and wanted to share some of the cool ways you can do this in Google Data Studio.

#google-data-studio #blending-data #dashboard #data-visualization #creating-visualizations #how-to-visualize-data #data-analysis #data-visualisation

Tia  Gottlieb

Tia Gottlieb


An introduction to exploratory data analysis in python

Many a time, I have seen beginners in data science skip exploratory data analysis (EDA) and jump straight into building a hypothesis function or model. In my opinion, this should not be the case. We should first perform an EDA as it will connect us with the dataset at an emotional level and yes, of course, will help in building good hypothesis function.

EDA is a very crucial step. It gives us a glimpse of what our data set is all about, its uniqueness, its anomalies and finally it summarizes the main characteristics of the dataset for us. In this post, I will share a very basic guide for performing EDA.

**Step 1: Import your data set **and have a good look at the data.

In order to perform EDA, we will require the following python packages.

Packages to import:

import numpy as np
	import pandas as pd
	import matplotlib.pyplot as plt
	import seaborn as sns
	from collections import defaultdict
	%matplotlib inline
view raw hosted with ❤ by GitHub

Once we have imported the packages successfully, we will move on to importing our dataset. You must be aware of read_csv() tool from pandas for reading csv files.

Import the dataset:

For the purpose of this tutorial, I have used Loan Prediction dataset from Analytics Vidhya. If you wish to code along, here is the link.

The dataset has been successfully imported. Let’s have a look at the Train dataset.


Image for post

Fig 1 : Overview of Train dataset

#data-science #python #pandas #data-analysis #data-visualization #data analysis

Paula  Hall

Paula Hall


3 Python Pandas Tricks for Efficient Data Analysis

Explained with examples.

Pandas is one of the predominant data analysis tools which is highly appreciated among data scientists. It provides numerous flexible and versatile functions to perform efficient data analysis.

In this article, we will go over 3 pandas tricks that I think will make you a more happy pandas user. It is better to explain these tricks with some examples. Thus, we start by creating a data frame to wok on.

The data frame contains daily sales quantities of 3 different stores. We first create a period of 10 days using the date_range function of pandas.

import numpy as np
import pandas as pd

days = pd.date_range("2020-01-01", periods=10, freq="D")

The days variable will be used as a column. We also need a sales quantity column which can be generated by the randint function of numpy. Then, we create a data frame with 3 columns for each store.

#machine-learning #data-science #python #python pandas tricks #efficient data analysis #python pandas tricks for efficient data analysis

Beginner’s Guide to Data Analysis using numpy and pandas

Oftentimes, we tend to forget that the pandas library is built on top of the numpy package. In this comprehensive guide, we take full advantage of the fact that all numpy functionalities are also available in pandas.

Incorporating the necessary packages

To be able to make full use of the power of both pandas and numpy, we must import the necessary packages. As is the well-known convention, we rename them appropriately:

Image for post

pandas renamed as pd; numpy renamed as np

In case we do not have these packages installed, we can do so though the terminal by typing the following command(s):

pip install pandas    # try pip3 if necessary
pip install numpy     # try pip3 if necessary

Once the packages have been imported and renamed, we have to use pd (for pandas) and np (for numpy). Otherwise, errors show up.

Creating DataFrame object

A DataFrame can be created from a list, a dictionary or even a numpy array. We populate a numpy array with random integers and build a DataFrame object out of it:

Image for post

5 x 3 numpy array filled with random integers

Using the randint( ) function from the random module of numpy, we managed to create a numpy array having 5 rows and 3 columns. The shape is passed in the form of a tuple as a third argument to randint( ). The first and second arguments to randint( ) denote lower bound and upper bound respectively of the range of numbers using which we create our array. Random numbers are generated between 10 to (50–1) because it is exclusive of the upper bound. We now pass the array as an argument to DataFrame( ), resulting in the creation of a DataFrame object:

Image for post

df is the DataFrame object

To display the content of df, which is nothing but a numpy array, we call upon the values attribute of DataFrame:

Image for post

Invoking values attribute on df returns the numpy array

The row headers (0, 1, 2, 3, 4) are auto-generated and are in the form of a sequence; so are the column headers (0, 1, 2). To get the row headers, which in this case is an auto-generated sequence, we use the index attribute:

Image for post

Valid row headers range from 0 to 4 with a step size of 1

To fetch column headers, which also is an auto-generated sequence, we use the columns attribute:

Image for post

Valid column headers range from 0 to 2 with a step size of 1

Mind you, a sequence goes up to but not including the stop value. Therefore, for both row and column sequences, the stop parameter is 1 more than the last value.

Image for post

Content of df

Since the DataFrame object is a numpy array, we can index and/or slice it in the same way we would index and/or slice a numpy array. The general form is:

df.values[row_index, column_index]  # indexing
df.values[row_start:row_stop, col_start:col_stop]  # slicing

Image for post

Display all columns of second row

Image for post

Display all columns of last row (row index = 4). A single value within [ ], like the one shown above, denotes all columns of the row index passed inside

Image for post

Display all rows of second column (column index = 1)

All the slices that we see above are numpy arrays:

Image for post

The type( ) function confirms our claim

We can also access a particular element of the DataFrame:

Image for post

Specifying the row index as well as the column index gives the element at their intersecting point

Assigning manual row headers and column headers

Image for post

Creating a DataFrame object from a numpy array built using random integers between 10 to 50

The row and column headers are auto-generated. We can come up with our own headers as well:

Image for post

Row labels range from R1 to R5. Column labels range from C1 to C3

Explicit indexing works on DataFrame objects

#numpy #data-analysis #pandas #data-science #python #data analysis

 iOS App Dev

iOS App Dev


Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition