Exploratory data analysis (EDA) is an approach to analyze the data and summarize its main characteristics. One spends a lot of time doing EDA to get an understanding of data.

EDA involves a lot of steps including some statistical tests, visualization of data using different kinds of plots, and many more. Some of the steps of EDA are discussed below:

  • Data Quality Check: Can be done using pandas library functions like describe()info()dtypes(), etc. It is used to find several features, its datatypes, duplicate values, missing value, etc.
  • **Statistical Test: **Some statistical tests like Pearson correlation, Spearman correlation, Kendall test, etc is done to get a correlation between the features. It can be implemented in python using the _stats _library.
  • Quantitative Test: Some quantitative test is used to find the spread of numerical features, count of categorical features. It can be implemented in python using the functions of the pandas library.
  • **Visualization: **Feature visualization is very essential to get an understanding of the data. Graphical techniques like bar plots, pie charts are used to get an understanding of categorical features, whereas scatter plots, histograms are used for numerical features.

To perform the above-mentioned tasks we need to type several lines of code. Here **pandas-profiling **open-source library comes into the play, which can perform all these tasks using just 1 line of code. The result of EDA using pandas-profiling can be displayed in a jupyter notebook or can be converted to an HTML page.

Installation:

Ways to install Pandas-profiling library:

  • Install pandas-profiling library directly from GitHub repo to jupyter notebook:
! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
  • Install the pandas-profiling library using pip:
pip install pandas-profiling
  • Install the pandas-profiling library using conda forge command:
conda install -c conda-forge pandas-profiling

Import the package:

To use the pandas-profiling library for EDA, we need to import the necessary libraries required:

import pandas as pd
import numpy as np
from pandas_profiling import ProfileReport

Do EDA with 1 line of Python code:

profile = ProfileReport(pd.read_csv('titanic.csv'),explorative=True)

Yes, that’s it, you are completed with exploratory data analysis. Results can be observed in jupyter notebook or google colab itself or the file can be saved in HTML format and used in a web browser.

#to view result in jupyter notebook or google colab
profile.to_widgets()

## to save results of pandas-profiling to a HTML file
profile.to_file("EDA.html")

#artificial-intelligence #machine-learning #data-science #python #exploratory-data-analysis

Exploratory Data Analysis with 1 line of Python code
1.40 GEEK