The real-world data often has a lot of missing values. The cause of missing values can be data corruption or failure to record data. The handling of missing data is very important during the preprocessing of the dataset as many machine learning algorithms do not support missing values.
This article covers 7 ways to handle missing values in the dataset:
data = pd.read_csv("train.csv")
msno.matrix(data)
Missing values can be handled by deleting the rows or columns having null values. If columns have more than half of rows as null then the entire column can be dropped. The rows which are having one or more columns values as null can also be dropped.
Pros:
Cons:
#towards-data-science #data-science #artificial-intelligence #handling-missing-values #machine-learning