Usually people believe “the numerical calculations are exact, but graphs are rough” even though it’s completely wrong. Even I was not right about it before learning data analytics.

If you are new in the data science or its sub fields, believe me this is the first step towards the understanding of the importance of Data Visualization along with the statistics result.

Anscombe’s Quartet is the modal example to demonstrate the importance of data visualization which was developed by the statistician Francis Anscombe in 1973 to signify both the importance of plotting data before analyzing it with statistical properties. It comprises of four data-set and each data-set consists of eleven (x,y) points. The basic thing to analyze about these data-sets is that they all share the same descriptive statistics(mean, variance, standard deviation etc) but different graphical representation. Each graph plot shows the different behavior irrespective of statistical analysis.

Image for post

Four Data-sets

Apply the statistical formula on the above data-set,

Average Value of x = 9

Average Value of y = 7.50

Variance of x = 11

Variance of y =4.12

Correlation Coefficient = 0.816

Linear Regression Equation : y = 0.5 x + 3

#data-science

Anscombe’s Quartet — An Importance of Data Visualization
2.45 GEEK