A simple hands-on practice on Scikit-learn. In this work I have tried to showcase the housing prices in California, datasets are available on GitHub
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
housing = pd.read_csv('housing.csv')
housing.head()
housing.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 10 columns):
longitude 20640 non-null float64
latitude 20640 non-null float64
housing_median_age 20640 non-null float64
total_rooms 20640 non-null float64
total_bedrooms 20433 non-null float64
population 20640 non-null float64
households 20640 non-null float64
median_income 20640 non-null float64
median_house_value 20640 non-null float64
ocean_proximity 20640 non-null object
dtypes: float64(9), object(1)
memory usage: 1.5+ MB
Big Data Jobs
housing['ocean_proximity'].value_counts()
<1H OCEAN 9136
INLAND 6551
NEAR OCEAN 2658
NEAR BAY 2290
ISLAND 5
Name: ocean_proximity, dtype: int64
housing.describe()
From the histogram, we can see that slightly over 800 districts have a median_house_value equal to about $100,000.
housing.hist(bins=50, figsize=(20,15))
plt.show()
#scikit-learn #ai #data-model #data-science #machine-learning #deep learning