The real-world data often has a lot of missing values. The cause of missing values can be data corruption or failure to record data. The handling of missing data is very important during the preprocessing of the dataset as many machine learning algorithms do not support missing values.

This article covers 7 ways to handle missing values in the dataset:

  1. Deleting Rows with missing values
  2. Impute missing values for continuous variable
  3. Impute missing values for categorical variable
  4. Other Imputation Methods
  5. Using Algorithms that support missing values
  6. Prediction of missing values
  7. Imputation using Deep Learning Library — Datawig
data = pd.read_csv("train.csv")
msno.matrix(data)

Delete Rows with Missing Values:

Missing values can be handled by deleting the rows or columns having null values. If columns have more than half of rows as null then the entire column can be dropped. The rows which are having one or more columns values as null can also be dropped.

Pros:

  • A model trained with the removal of all missing values creates a robust model.

Cons:

  • Loss of a lot of information.
  • Works poorly if the percentage of missing values is excessive in comparison to the complete dataset.

#towards-data-science #data-science #artificial-intelligence #handling-missing-values #machine-learning

7 Ways to Handle Missing Values in Machine Learning
20.20 GEEK