Skewness and Kurtosis in data science

“Reality is partial to symmetry and slight anachronisms”

Jorge Luis Borges

Introduction

Data skewness is one of the important challenges that data scientists often face in real-time case studies. Apart from certain business scenarios, most of the real-time experiments need data in any predefined data distribution and that is very rare without undergoing a data cleaning process. In this article, we will discuss the terminologies and intuition behind the violation of symmetrical data distribution and how it can be evaluated using different mathematical metrics.

Table contents

Definition of Skewness
Types of skewness (Right skewness and Left skewness)
Some important scenarios in the normal distribution
Kurtosis
Types of Kurtosis
Approaches to follow when the data is skewed

Definition

Skewness is the measure of symmetry or asymmetry of data distribution.

A distribution or data set is said to be symmetric if it looks the same to the left and right points of the center.

Types of skewness

Skewness is generally classified into 2 broad categories-

· Right skewness or Positive skewness

· Left skewness or Negative skewness

Right skewness

A right-skewed distribution will have a long tail in the right direction on the number line such that the mean of the total intrinsic value of all data points will eventually go up.

#data-science #statistics #kurtosis #machine-learning #skewness #data analytic