So you want to try a bite of the big apple? You want to wake up in a city that doesn’t sleep? And you don’t care if it’s Chinatown or on Riverside. Well you’re hardly the first. At 10 million residents and 60 million annual visitors, the city has seen its fair share of people cross the gates of JFK or LGA.

For these 60 million annual visitors, many will probably stay at one of Airbnb’s 50k rentals. This poses an interesting problem about using NYC’s available datasets to accurately price Airbnb rentals throughout the year. We will go about this problem in an iterative approach where we build out a model first and then improve it incrementally.

The end result is a web app that prices rentals in real-time: Pad Pricer.


Step 1: Scraping Airbnb for Monthly Listings

As one may have guessed, the first step in pricing Airbnb rentals is to get a listing of rentals from Airbnb. Luckily for us, Airbnb is more than happy to provide this data (in a monthly format). To get to a point where we’re able to predict a rental price, though, we’ll need to go about a three step approach:

  1. Get links for all NYC rentals (from the past 50 months)
  2. Download the 50 links into 50 Pandas DataFrames
  3. Compile 50 DataFrames to a single cleaned one

Image for post

Step 1.1: Getting Airbnb Links for NYC Rentals

In order to get all the NYC monthly rentals from Airbnb’s data store, we’ll use BeautifulSoup to scrape the website’s index page for their links and then store the urls – as well as the month and year – inside a pandas dataframe. For this process, months and years are computed based on a datetime string associated with each url.

#regression #towards-data-science #airbnb #nyc #machine-learning-pipeline

A New Yorker’s Guide to Airbnb Pricing
1.15 GEEK