Data Scientists widely use EDA to understand datasets for decision-making and data cleaning processes. EDA reveals crucial information about the data, such as hidden patterns, outliers, variance, covariance, correlations between features. The information is essential for the hypothesis’s design and creating better-performing models.

Figure showing the process flow from data collection to decision making.

Generally, EDA falls into two categories:

  • The univariate analysis involves analyzing one feature, such as summarizing and finding the feature patterns.
  • The multivariate analysis technique shows the relationship between two or more features using cross-tabulation or statistics.

The figure below shows the further subdivision of EDA based on data to analyze and methods such as numerical or graphical methods.

This article covers various ways of performing EDA using the Titanic dataset taken from Kaggle.

Getting data

The Titanic dataset is downloaded from Kaggle to local drive and then loaded into pandas DataFrame using read_csv() method.

#exploratory-data-analysis #data #data-science #data-analysis #analytics

How to ace Exploratory Data Analysis
1.05 GEEK