Exploratory data analysis (EDA) is an approach to analyze the data and summarize its main characteristics. One spends a lot of time doing EDA to get an understanding of data.
EDA involves a lot of steps including some statistical tests, visualization of data using different kinds of plots, and many more. Some of the steps of EDA are discussed below:
describe()
, info()
, dtypes()
, etc. It is used to find several features, its datatypes, duplicate values, missing value, etc.To perform the above-mentioned tasks we need to type several lines of code. Here **pandas-profiling **open-source library comes into the play, which can perform all these tasks using just 1 line of code. The result of EDA using pandas-profiling can be displayed in a jupyter notebook or can be converted to an HTML page.
Ways to install Pandas-profiling library:
! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
pip install pandas-profiling
conda install -c conda-forge pandas-profiling
To use the pandas-profiling library for EDA, we need to import the necessary libraries required:
import pandas as pd
import numpy as np
from pandas_profiling import ProfileReport
profile = ProfileReport(pd.read_csv('titanic.csv'),explorative=True)
Yes, that’s it, you are completed with exploratory data analysis. Results can be observed in jupyter notebook or google colab itself or the file can be saved in HTML format and used in a web browser.
#to view result in jupyter notebook or google colab
profile.to_widgets()
## to save results of pandas-profiling to a HTML file
profile.to_file("EDA.html")
#artificial-intelligence #machine-learning #data-science #python #exploratory-data-analysis