SQL DATA ANALYSIS | Torronto AirBnbs - Most Revenue Generating / Opportunity for Cleaning

In this data analysis project we will try to find out the most revenue creating airbnbs in torronto using sql alone.

Github Repo
Pre requisites

We can download dataset from airbnb insider website., ,There are 3 csv files to be downloaded : Listings, Calendar and Reviews.
Tools Needed : Python/Notepad++,Sql Workbench.

Now there are ways to import data to SQL Workbench from a csv file. 
1) One is a premade python script that create pandas dataframe and send it to sql workbench .
2) Using SQLWorkbench wizard to import tables from a  csv file.
3) Use CSV lint plugin to convert CSV file into a sql create table command which we can run on Workbench.

Using one of above 3 methods do the importing of all 3 tables into Workbench.

Top AirBnB Owners in Torronto

Now let's get to work. Let's find out who are the top 10 airbnb revenue generators in Torronto. For that first we need to clean up the price column. We can actually find the price column in listings table but now we need to clean that since it is not a    number because it has that dollar sign in front of it which is actually making it a string , which is not good for calculations and averages and sums and    stuff like that. so let's go ahead and get that cleaned up. what i did was replacting the dollar sign in the price column with nothing and i cast that entire column unsigned which will basically turn it into an integer. This cleaned up column is renamed as price_modified.

CAST(REPLACE(Price,'$','') AS UNSIGNED) AS price_modified

The issue is that in listing table we don't have room availability column to indicate how many rooms are booked out. But we have columns like availability_30,availability_60,availability_365 etc. So we are going to basically find    the availability of rooms in next year, so choose availability_365 column and substract from 365 to get the no of rooms booked out for next year. We name that column as booked_365.

(365 - availability_365) AS booked_365

Next we will be calculating the revenue which can be achieved by taking the price times no of booked rooms divided by beds in room.We will name this column as revenue_365.

CAST(REPLACE(Price,'$','') AS UNSIGNED)*(365 - availability_365) / beds AS revenue_365

Ordering the result by revenue_365 column will give us below result which is the top 10 owners in Torronto.

SELECT host_name,host_id,
CAST(REPLACE(Price,'$','') AS UNSIGNED) AS price_modified,
(365 - availability_365) AS booked_365 , 
CAST(REPLACE(Price,'$','') AS UNSIGNED)*(365 - availability_365) / beds AS revenue_365
FROM listings 
ORDER BY revenue_365 

Starting a Cleaning Service Company and Getting AirBnB owners as Customers

Maybe i don't actually want to start an airbnb.  Maybe i just want to start an airbnb servicing company specifically in  cleaning. If i could maybe create like an awesome list of potential customers for my airbnb cleaning business based off of the idea of i'm going to find the hosts  who have the most complaints about their listing being dirty and those people are probably likely to be open to my pitch.

To do this we can take the reviews table and do an inner join on the listings where the listing id from reviews match with the listing id from listing table and comments containing dirty keyword.

COUNT(*) AS num_dirty_reviews FROM reviews INNER JOIN 
listings ON reviews.listing_id =
WHERE comments LIKE "%dirty%"

Grouping the result with host_id,host_name and ordering by num_dirty_reviews will give us the final result. This list can be used to call those airbnb hosts to ask if they would like your cleaning service. Voila! Business is boooming.

SELECT host_id, host_name, 
COUNT(*) AS num_dirty_reviews FROM reviews INNER JOIN 
listings ON reviews.listing_id =
WHERE comments LIKE "%dirty%"
GROUP BY host_id, host_name ORDER BY num_dirty_reviews DESC;


1.50 GEEK