The biggest challenge for data scientists is probably something that sounds mundane, but very important for any analyses — cleaning dirty data. When you think of dirty data, you are probably thinking about inaccurate or malformed data. But the truth is, missing data is actually the most common occurrence of dirty data. Imagine trying to do a customer segmentation analysis, but 50% of the data have no address on record. It would be hard or impossible to do your analysis since the analysis would be bias in showing no customers in certain areas.
There are three kinds of missing data:
In this article, I’ll go over the types of missing data with examples, and share how to handle missing data with imputation.
#technology #statistics #data-science #missing-data #women-in-tech