Nina Diana

Nina Diana


Pandas EDA Libraries you Need in 2021

EDA (Exploratory Data Analysis) is one of the first steps performed on a given dataset. It helps us to understand more about our data and gives us an idea of manipulations and cleaning we might have to do. EDA can take anywhere from a few lines to a few hundred lines. In this tutorial, we will look at libraries which help us perform EDA in a few lines


We will use the Titanic Dataset provide by Kaggle. Using Panda’s describe() method, we get the below output

Image for post

Screenshot by Author

As you can see the Age Column has missing values. The below libraries are basically describe() on steroids.

1. Pandas-Profiling

Image for post

Screencast of EDA Report Generated by Pandas Profiling

Install and Usage

First, we will instal the library

pip install pandas-profiling

Next, we will import the library and generate the report

import pandas_profiling

prof_report = pandas_profiling.ProfileReport(df , title = 'Titanic Report')

To display it inside the notebook


To generate it as an HTML file


#data-analysis #data-science #python-libraries #python

What is GEEK

Buddha Community

Pandas EDA Libraries you Need in 2021
Kasey  Turcotte

Kasey Turcotte


Playing with Pandas library

The techniques for Reshaping, Grouping, and Pivoting the data

Python has turned the world just in a decade with its popularity and efficiency. Python has followed offering a reliable trend of Data Science which comprises of:

· Data Gathering

· Data Cleaning

· Machine Learning models

· Visualization of Data

Pandas is a very fundamental inbuilt library in Python uptakes a lot of the area. It is an open-source library that is easy to use, providing high efficiency and many tools used in the analysis of data for Python programming.

Pandas is an in-memory no SQL type database providing a helping hand for basic SQL constructs, statistical methods, and the capability of graphing. As it was built on top of Cython, it runs quicker along with consuming less time to access some memory within a machine.

→Pandas have a very advanced feature of carrying out some operations on the group of data frames.

→Data Frame: A 2D data that is labeled. It contains different columns and rows.

So, in this article, we’re going to have our quick eyes on some methods of grouping, reshaping, and pivoting the data.

#pandas #data-science #python #artificial-intelligence #playing with pandas library #pandas library

Exploratory Data Analysis: Dataprep.eda vs Pandas-Profiling

Use the right tool for Exploratory Data Analysis (EDA)

#data-analysis #eda #python #dataprep.eda #pandas #pandas-profiling

Udit Vashisht


Python Pandas Objects - Pandas Series and Pandas Dataframe

In this post, we will learn about pandas’ data structures/objects. Pandas provide two type of data structures:-

Pandas Series

Pandas Series is a one dimensional indexed data, which can hold datatypes like integer, string, boolean, float, python object etc. A Pandas Series can hold only one data type at a time. The axis label of the data is called the index of the series. The labels need not to be unique but must be a hashable type. The index of the series can be integer, string and even time-series data. In general, Pandas Series is nothing but a column of an excel sheet with row index being the index of the series.

Pandas Dataframe

Pandas dataframe is a primary data structure of pandas. Pandas dataframe is a two-dimensional size mutable array with both flexible row indices and flexible column names. In general, it is just like an excel sheet or SQL table. It can also be seen as a python’s dict-like container for series objects.

#python #python-pandas #pandas-dataframe #pandas-series #pandas-tutorial

Alex Riley

Alex Riley


Best Web App Ideas To Make Money In 2021 - Application Startup Guide

Some Popular Web App Ideas for 2021

Are you looking for best web application business ideas that make money in 2021?

There are lots of simple web app ideas but all those web application business ideas do not make money.

Read More

#trending web app ideas 2021 #trending web application ideas 2021 #web application ideas 2021 #web app ideas 2021 #new web app ideas 2021 #evergreen web app ideas 2021

Mya  Lynch

Mya Lynch


A Complete Guide To Bamboolib - GUI Tool for Analyzing Pandas

Analyzing and Visualizing the data is the most important and time taking process. We need to invest a lot of time in order to clearly analyze what the data is all about and what it is trying to tell. We use different types of python libraries and functions to visualize the patterns and anomalies in the dataset in order to get familiar with the dataset.

Bamboolib is GUI for pandas DataFrames that enables anyone to work with python in Jupyter Notebook or JupyterLab. Bamboolib is a highly interactive and extensively helpful library in order to analyze, visualize, and manipulate the data. Even a person with a non-technical background can use it to draw insights from data because it does not require any coding experience.

Bamboolib is used by more than 100 companies and it allows data analysts to work with python even without writing code. Bamboolib is not open-source which means that you need to buy bamboolib in order to use it, but it provides a 14-day free trial version so that you can fully explore it and see how it can be useful for you.

In this article, we will explore different uses of bamboolib and see how it saves time and effort. We will explore different functions that bamboolib provides and also export the code used for that functionality.

Implementation of Bamboolib:

For exploring bamboolib we first need to register on their website for a 14 days free trial. After registering you will receive an email with the activation key on registered email-id. Like any other python library, we need to install bamboolib using pip install bamboolib.

  1. Importing required libraries

We will need to import pandas for loading the dataset and bamboolib for visualizing the dataset.

import bamboolib as bam

import pandas as pd

  1. Loading the dataset

We will be using a car design dataset here, which contains different attributes related to Automobile Manufacturing companies. You can download this dataset from Kaggle. We will use pandas to load this dataset.

df = pd.read_csv(‘car_design.csv’)

  1. Analyzing the dataset

This is the main step where we will analyze and visualize the dataset using bamboolib.

#developers corner #automating eda #data analytics #eda #pandas #plotly #python pandas #visualization