# Kaggle House Prices Prediction with Linear Regression and Gradient Boosting Kaggle House Prices Prediction with Linear Regression and Gradient Boosting. This notebook achieved a score of 0.12 and within the top 25% in this Kaggle House Price competition

My **[Kaggle Notebook Link is here**](https://www.kaggle.com/paulrohan2020/eda-and-simple-linear-regression-for-house-price)

### As I intended this Notebook to be published as a blog on Linear Regression, Gradient Descent function and some EDA, so in the first 50% to 60% of this notebook I have mainly discussed the theory around those topics from mainly a beginner perspective.

``````import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.ensemble import RandomForestRegressor
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import mean_squared_log_error
from sklearn.metrics import mean_squared_error
from xgboost import XGBRegressor
import math``````

The evaluation criteria for this Kaggle Competition is RMSLE — “Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)”

The Root Mean Squared Log Error (RMSLE) can be defined using a slight modification on sklearn’s mean_squared_log_error function, which itself a modification on the familiar Mean Squared Error (MSE) metric.

``````def root_mean_squared_log_error(y_validations, y_predicted):
if len(y_predicted) != len(y_validations):
return 'error: mismatch in number of data points between x and y'
y_predict_modified = [math.log(i) for i in y_predicted]
y_validations_modified = [math.log(i) for i in y_validations]

return mean_squared_error(y_validations_modified, y_predict_modified, squared=False) ``df_test.head()`` First, I am getting a basic description of the data, to look quickly at the overall data to get some simple, easy-to-understand information. This is just to get a basic feel for the data. Using describe() function to get various summary statistics that exclude NaN values.

``df_train.describe()`` ## Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

## Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

## Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

## Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

## Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.