In this article, we are going to talk about how to identify and treat the missing values in the data step by step.

In the last blog, we discussed the importance of the data cleaning process in a data science project and ways of cleaning the data to convert a raw dataset into a useable form. Here, we are going to talk about how to identify and treat the missing values in the data step by step.

Real-world data would certainly have missing values. This could be due to many reasons such as data entry errors or data collection problems. Irrespective of the reasons, it is important to handle missing data because any statistical results based on a dataset with non-random missing values could be biased. Also, many ML algorithms do not support data with missing values.

How to identify missing values?

We can check for null values in a dataset using pandas function as:

But, sometimes, it might not be this simple to identify missing values. One needs to use the domain knowledge and look at the data description to understand the variables. For instance, in the dataset below, isnull() does not show any null values.

#2020 jun tutorials # overviews #data preparation #data preprocessing #missing values #python

How to Deal with Missing Values in Your Dataset
1.30 GEEK