For all AirBnB users and hosts in Seattle, I will analyze and answer business-related questions in these aspects:
Questions and answers are covered below.
Here I will perform Exploratory Data Analysis on the data provided by Inside Airbnb on Kaggle, you can download the data from here(zip file), Zip file contains 3 csv files: listing.csv, calendar.csv, and reviews.csv
Read the csv file using pandas as given below:
#read listing.csv, and its shape
listing_seattle = pd.read_csv(‘listings_seattle.csv’)
print(‘Shape of listing csv is’,listing_seattle.shape)
listing_seattle.sample(5) #display 5 rows at random
Have a look at the data and have some sanity checks like the percentage of missing values per column, are the listing_ids unique throughout the dataset?, examine the summary of numerical columns, etc.
Percentage of missing values per column
From the above bar chart, we get the important columns with the least missing values. Columns like license and square****feet have more than 95% of the data missing, hence we will drop these columns.
Are the ids unique for each row?
len(listing_seattle['id'].unique()) == len(listing_seattle)
Description of all numeric features
listing_seattle.describe()
#data-science #machine-learning #business-analysis #data-visualization #data-analysis #data analysis