Analysis, Price Modeling and Prediction: AirBnB Data for Seattle. A detailed overview of AirBnB’s Seattle data analysis using Data Engineering & Machine Learning techniques.

Business Understanding

For all AirBnB users and hosts in Seattle, I will analyze and answer business-related questions in these aspects:

  • Price Analysis
  • Listings count Analysis
  • Busiest time Analysis
  • Occupancy rate and Reviews Analysis
  • Modeling for Price Prediction

Questions and answers are covered below.

Data Understanding

Here I will perform Exploratory Data Analysis on the data provided by Inside Airbnb on Kaggle, you can download the data from here(zip file), Zip file contains 3 csv files: listing.csvcalendar.csv, and reviews.csv

Overview of listing.csv

Read the csv file using pandas as given below:

#read listing.csv, and its shape
listing_seattle = pd.read_csv(‘listings_seattle.csv’)
print(‘Shape of listing csv is’,listing_seattle.shape)
listing_seattle.sample(5)    #display 5 rows at random

Basic checks and high-level data analysis

Have a look at the data and have some sanity checks like the percentage of missing values per column, are the listing_ids unique throughout the dataset?, examine the summary of numerical columns, etc.

  • Percentage of missing values in each column

Percentage of missing values per column

From the above bar chart, we get the important columns with the least missing values. Columns like _**_license_ and _square**feet_ have more than 95% of the data missing, hence we will drop these columns._

Are the ids unique for each row?
len(listing_seattle['id'].unique()) == len(listing_seattle)

Description of all numeric features

