I recently came across a scenario where I educated myself about the difference between the Pearson and Spearman correlation coefficient. I felt that is one piece of information that a lot of people in the data science fraternity on the medium can make use of. I’ll explain thoroughly the difference between the two and the exact scenarios where the use of each one is suitable. Read on!Contents of this post:

  1. Definition of CorrelationComparative analysis between Pearson and Spearman correlation coefficients

Definition of Correlation

Correlation is the degree to which two variables are linearly related. This is an important step in bi-variate data analysis. In the broadest sense correlation is actually any statistical relationship, whether causal or not, between two random variables in bivariate data.

An important rule to remember is that Correlation doesn’t imply causation

Let’s understand through two examples as to what it actually implies.

  1. The consumption of ice-cream increases during the summer months. There is a strong correlation between the sales of ice-cream units. In this particular example, we see there is a causal relationship also as the extreme summers do push the sale of ice-creams up.Ice-creams sales also have a strong correlation with shark attacks. Now as we can see very clearly here, the shark attacks are most definitely not caused due to ice-creams. So, there is no causation here.

Hence, we can understand that the correlation doesn’t ALWAYS imply causation!

What is the Correlation Coefficient?

The correlation coefficient is a statistical measure of the strength of the relationship between the relative movements of two variables. The values range between -1.0 and 1.0. A correlation of -1.0 shows a perfect negative correlation, while a correlation of 1.0 shows a perfect positive correlation. A correlation of 0.0 shows no linear relationship between the movement of the two variables.


2 Important Correlation Coefficients — Pearson & Spearman

1. Pearson Correlation Coefficient

**Wikipedia Definition: **In statistics, the Pearson correlation coefficient also referred to as Pearson’s _r _or the bivariate correlation is a statistic that measures the linear correlation between two variables X and Y. It has a value between +1 and −1. A value of +1 is a total positive linear correlation, 0 is no linear correlation, and −1 is a total negative linear correlation.

_Important Inference to keep in mind: _The Pearson correlation can evaluate ONLY a linear relationship between two continuous variables (A relationship is linear only when a change in one variable is associated with a proportional change in the other variable)Example use case:_ We can use the Pearson correlation to evaluate whether an increase in age leads to an increase in blood pressure._

Below is an example of how the Pearson correlation coefficient ® varies with the **strength and the direction of the relationship **between the two variables. Note that when no linear relationship could be established (refer to graphs in the third column), the Pearson coefficient yields a value of zero.

#data-science #artificial-intelligence #machine-learning #data analysis

Clearly explained: Pearson V/S Spearman Correlation Coefficient
1.65 GEEK