I’m sure that every Data Scientist/ ML Practitioner has faced the challenge of missing values in their dataset. It is a common data cleaning process, but frankly, a very overlooked and neglected one. However, an effective missing value strategy can have a significant impact on your model’s performance.
The reason as to why missing values occur is often specific to the problem domain. However, most of the time they occur from the following scenarios:
The reason you should deal with missing values is because many ML algorithms require numeric input values, and can’t operate with missing values, therefore if you try run the algorithm with missing values, it will respond with an error(scikit-learn). However, some algorithms, such as XGBoost, will impute values based on training loss reduction.
#data #machine-learning #artificial-intelligence #data-visualization #data-science