What Is Chi-square Goodness Of Fit Test? As a data science engineer, it’s imperative that the sample data set which you pick from the data is reliable, clean.

As a data science engineer, it's imperative that the sample data set which you pick from the data is reliable, clean, and well tested for its usability in machine learning model building.

**So how do you do that?**

Well, we have multiple statistical techniques like descriptive statistics where we measure the data central value, how it is spread across the mean/median. Is it normally distributed or there is a skew in the data spread? Please refer to my previous article on the same for more clarity.

As the first thing we do is to visualize the data using various data visualization techniques to make some early sense of any data skewness or discrepancies, to identify any kind of relationship between data set variables.

Data has so much to say and we data engineer give it a voice to express and describe itself, using descriptive statistical techniques.

But to make any prediction or to infer something beyond the given data to find any hidden probability, we rely on inferential statistic techniques.

*Inferential statistics are concerned with making inferences based on relations found in the sample, to relations in the population. Inferential statistics help us decide, for example, whether the differences between groups that we see in our data are strong enough to provide support for our hypothesis that group differences exist in general, in the entire population.*

