In this article, I want to share my approach to solve a house price forecasting competition from Kaggle. This is a classical multivariable regression problem that data scientists face very often in their jobs, that’s why I think it is very interesting to dive deeper into this problem.

First of all, we load our data into a Pandas DataFrame and then perform data cleaning and data preprocessing, consisting of:

  • Replacing categorical variables by numbers with Pandas “replace” function
  • Filling empty “NaN” values with Pandas “fillna(0)” function
  • Deleting columns for low correlation variables with the Pandas “drop()” function, in this particular case, I didn’t care about variables with an absolute value of correlation less than 0.4.

It is not my intention in this article to go into detail about the data cleaning process, I will assume you know how to do that and will focus on advanced regression techniques. Just remember to do all your preprocessing in place so that your data frame is actually changed.

#programming #data-science #python #artificial-intelligence #machine-learning

Forecasting House Prices:  Kaggle Competition with Python code
2.45 GEEK