Arne  Denesik

Arne Denesik

1604145120

Practical EDA Guide with Pandas

Pandas being a widely-used data analysis library provides numerous functions and methods to work on tabular data. The rich selection of easy-to-use functions makes the exploratory data analysis (EDA) process fairly easy.

In this post, we will explore the student performance dataset available on Kaggle. The dataset contains some personal information about students and their performance on certain tests.

Let’s start by reading the dataset into a pandas dataframe.

import numpy as np
import pandas as pd

df = pd.read_csv("/content/StudentsPerformance.csv")
df.shape
(1000,8)
df.head()

Image for post

(image by author)

There are 5 categorical features and scores of 3 different tests. The goal is to check how these features affect the test scores.

We can start by checking the distribution of test scores. The plot function of pandas can be used to create a kernel density plot (KDE).

df['reading score'].plot(kind='kde', figsize=(10,6), title='Distribution of Reading Score')

#data-analysis #data-science #machine-learning #artificial-intelligence #pandas

What is GEEK

Buddha Community

Practical EDA Guide with Pandas

Practice Problems: How To Join DataFrames in Pandas

Hey - Nick here! This page is a free excerpt from my $199 course Python for Finance, which is 50% off for the next 50 students.

If you want the full course, click here to sign up.

It’s now time for some practice problems! See below for details on how to proceed.

Course Repository & Practice Problems

All of the code for this course’s practice problems can be found in this GitHub repository.

There are two options that you can use to complete the practice problems:

  • Open them in your browser with a platform called Binder using this link (recommended)
  • Download the repository to your local computer and open them in a Jupyter Notebook using Anaconda (a bit more tedious)

Note that binder can take up to a minute to load the repository, so please be patient.

Within that repository, there is a folder called starter-files and a folder called finished-files. You should open the appropriate practice problems within the starter-files folder and only consult the corresponding file in the finished-files folder if you get stuck.

The repository is public, which means that you can suggest changes using a pull request later in this course if you’d like.

#dataframes #pandas #practice problems: how to join dataframes in pandas #how to join dataframes in pandas #practice #/pandas/issues.

Practice Problems: How To Use Pandas DataFrames' GroupBy Method

It’s now time for some practice problems! See below for details on how to proceed.

Course Repository & Practice Problems

All of the code for this course’s practice problems can be found in this GitHub repository.

There are two options that you can use to complete the practice problems:

  • Open them in your browser with a platform called Binder using this link (recommended)
  • Download the repository to your local computer and open them in a Jupyter Notebook using Anaconda (a bit more tedious)

Note that binder can take up to a minute to load the repository, so please be patient.

Within that repository, there is a folder called starter-files and a folder called finished-files. You should open the appropriate practice problems within the starter-files folder and only consult the corresponding file in the finished-files folder if you get stuck.

The repository is public, which means that you can suggest changes using a pull request later in this course if you’d like.

#pandas #groupby methods #pandas dataframe #example #practice problems: how to use pandas dataframes' groupby method #practice problems

Mya  Lynch

Mya Lynch

1598789880

A Complete Guide To Bamboolib - GUI Tool for Analyzing Pandas

Analyzing and Visualizing the data is the most important and time taking process. We need to invest a lot of time in order to clearly analyze what the data is all about and what it is trying to tell. We use different types of python libraries and functions to visualize the patterns and anomalies in the dataset in order to get familiar with the dataset.

Bamboolib is GUI for pandas DataFrames that enables anyone to work with python in Jupyter Notebook or JupyterLab. Bamboolib is a highly interactive and extensively helpful library in order to analyze, visualize, and manipulate the data. Even a person with a non-technical background can use it to draw insights from data because it does not require any coding experience.

Bamboolib is used by more than 100 companies and it allows data analysts to work with python even without writing code. Bamboolib is not open-source which means that you need to buy bamboolib in order to use it, but it provides a 14-day free trial version so that you can fully explore it and see how it can be useful for you.


In this article, we will explore different uses of bamboolib and see how it saves time and effort. We will explore different functions that bamboolib provides and also export the code used for that functionality.

Implementation of Bamboolib:

For exploring bamboolib we first need to register on their website for a 14 days free trial. After registering you will receive an email with the activation key on registered email-id. Like any other python library, we need to install bamboolib using pip install bamboolib.

  1. Importing required libraries

We will need to import pandas for loading the dataset and bamboolib for visualizing the dataset.

import bamboolib as bam

import pandas as pd

  1. Loading the dataset

We will be using a car design dataset here, which contains different attributes related to Automobile Manufacturing companies. You can download this dataset from Kaggle. We will use pandas to load this dataset.

df = pd.read_csv(‘car_design.csv’)

  1. Analyzing the dataset

This is the main step where we will analyze and visualize the dataset using bamboolib.

bam.show(df)

#developers corner #automating eda #data analytics #eda #pandas #plotly #python pandas #visualization

Exploratory Data Analysis: Dataprep.eda vs Pandas-Profiling

Use the right tool for Exploratory Data Analysis (EDA)

#data-analysis #eda #python #dataprep.eda #pandas #pandas-profiling

Udit Vashisht

1586702221

Python Pandas Objects - Pandas Series and Pandas Dataframe

In this post, we will learn about pandas’ data structures/objects. Pandas provide two type of data structures:-

Pandas Series

Pandas Series is a one dimensional indexed data, which can hold datatypes like integer, string, boolean, float, python object etc. A Pandas Series can hold only one data type at a time. The axis label of the data is called the index of the series. The labels need not to be unique but must be a hashable type. The index of the series can be integer, string and even time-series data. In general, Pandas Series is nothing but a column of an excel sheet with row index being the index of the series.

Pandas Dataframe

Pandas dataframe is a primary data structure of pandas. Pandas dataframe is a two-dimensional size mutable array with both flexible row indices and flexible column names. In general, it is just like an excel sheet or SQL table. It can also be seen as a python’s dict-like container for series objects.

#python #python-pandas #pandas-dataframe #pandas-series #pandas-tutorial