Handling missing values is a key step in exploratory data analysis (EDA) for most data science projects whether your developing machine learning models or business analytics. Most libraries including scikit-learn don’t build a model using data with missing values. Due to the high quantity of data, finding tricks for getting the best imputing values results is a massive advantage for becoming a unicorn data scientist. In this article, we will review the 3 most successful open source short python code lines which can be combined for handling missing values.

For this article, we will be analyzing the samples flowers, titanic, and house prices Kaggle datasets you can find here.

Introduction

There are many scenarios dealing with missing values, missing numbers are commonly represented in python as nan which is short for “not a number”. The classical method consists in detecting cells with missing values, and count their numbers in each column with this command:

missing_val_count_by_column = (data.isnull().sum())

print(missing_val_count_by_column[missing_val_count_by_column > 0

#data-science #python #data #machine-learning #data-preprocessing

Handling Missing Values : the exclusive pythonic guide
1.25 GEEK