Data Scientists widely use EDA to understand datasets for decision-making and data cleaning processes. EDA reveals crucial information about the data, such as hidden patterns, outliers, variance, covariance, correlations between features. The information is essential for the hypothesis’s design and creating better-performing models.
Figure showing the process flow from data collection to decision making.
Generally, EDA falls into two categories:
The figure below shows the further subdivision of EDA based on data to analyze and methods such as numerical or graphical methods.
This article covers various ways of performing EDA using the Titanic dataset taken from Kaggle.
The Titanic dataset is downloaded from Kaggle to local drive and then loaded into pandas DataFrame using read_csv() method.
#exploratory-data-analysis #data #data-science #data-analysis #analytics