Missing values can cause problems when modeling classification and regression prediction problems with machine learning algorithms.

A common approach is to replace missing values with a calculated statistic, such as the mean of the column. This allows the dataset to be modeled as per normal but gives no indication to the model that the row original contained missing values.

One approach to address this issue is to include additional binary flag input features that indicate whether a row or a column contained a missing value that was imputed. This additional information may or may not be helpful to the model in predicting the target value.

In this tutorial, you will discover how to add binary flags for missing values for modeling.

After completing this tutorial, you will know:

  • How to load and evaluate models with statistical imputation on a classification dataset with missing values.
  • How to add a flag that indicates if a row has one more missing values and evaluate models with this new feature.
  • How to add a flag for each input variable that has missing values and evaluate models with these new features.

Discover data cleaning, feature selection, data transforms, dimensionality reduction and much more in my new book, with 30 step-by-step tutorials and full Python source code.

#data preparation #machine-learning

Add Binary Flags for Missing Values for Machine Learning
2.00 GEEK