7 Ways to Handle Missing Values in Machine Learning

The real-world data often has a lot of missing values. The cause of missing values can be data corruption or failure to record data. The handling of missing data is very important during the preprocessing of the dataset as many machine learning algorithms do not support missing values.

This article covers 7 ways to handle missing values in the dataset:

Deleting Rows with missing values
Impute missing values for continuous variable
Impute missing values for categorical variable
Other Imputation Methods
Using Algorithms that support missing values
Prediction of missing values
Imputation using Deep Learning Library — Datawig

data = pd.read_csv("train.csv")
msno.matrix(data)

Delete Rows with Missing Values:

Missing values can be handled by deleting the rows or columns having null values. If columns have more than half of rows as null then the entire column can be dropped. The rows which are having one or more columns values as null can also be dropped.

Pros:

A model trained with the removal of all missing values creates a robust model.

Cons:

Loss of a lot of information.
Works poorly if the percentage of missing values is excessive in comparison to the complete dataset.

#towards-data-science #data-science #artificial-intelligence #handling-missing-values #machine-learning

Delete Rows with Missing Values:

towardsdatascience.com

7 Ways to Handle Missing Values in Machine Learning