Statistics for Data Science — A beginners guide to Descriptive Statistics in Python. A working example of Central Tendency, Dispersion, Standard Deviation, and Correlation using Excel, Python, and real-world Business Problem

Statistics are much-needed components of Data Science. Why Statistics? In simpler terms, before any model can be developed, we need to be 100% sure of the data we are working with. The general idea being, **“it is advisable to use clean data”** for any **machine learning algorithm**, else the model ends up producing results contrary to what’s expected out of it. Think of it like a painter, who starts painting a portrait, but the palette used is not clean enough. Will that produce a painting that will go for an auction or will stay in one corner of the house?

But wait. Statistics is not that simple. Multiple underlying theories need to be validated before any assumption or methodology can be used. We all are familiar with **bell curves** or curves used to demonstrate the normal distribution of the data. If you are familiar with the fundamentals of statistics, you will realize that the normal distribution is a probability function that describes how the values of a variable are distributed**. ***The X-axis represents the data points, whereas the Y-axis talks about the probability density estimate of the given point*. Confused? I will make it clear later on. But the general idea being, as a beginner we need not bother ourselves with every component of statistics, rather focus on the more useful ones.

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.