Scraping Yelp data with Python and Beautiful Soup

Scraping Yelp data with Python and Beautiful Soup

We are going to see how we can scrape Yelp data using Python and BeautifulSoup is a simple and elegant manner.

Today we are going to see how we can scrape Yelp data using Python and BeautifulSoup is a simple and elegant manner.

The aim of this article is to get you started on a real-world problem solving while keeping it super simple so you get familiar and get practical results as fast as possible.

So the first thing we need is to make sure we have Python 3 installed. If not, you can just get Python 3 and get it installed before you proceed.

Then you can install beautiful soup with

pip3 install beautifulsoup4

We will also need the libraries requests, lxml, and soupsieve to fetch data, break it down to XML, and to use CSS selectors. Install them using.

pip3 install requests soupsieve lxml

Once installed open an editor and type in.

## -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

Now let’s go to the Yelp San Fransisco restaurants listing page and inspect the data we can get.

This is how it looks:

Image for post

Back to our code now. Let’s try and get this data by pretending we are a browser like this.

## -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url='https://www.yelp.com/search?cflt=restaurants&find_loc=San Francisco, CA'
response=requests.get(url,headers=headers)
print(response)

Save this as yelp_bs.py.

If you run it.

python3 yelp_bs.py

You will see the whole HTML page

Now, let’s use CSS selectors to get to the data we want. To do that let’s go back to Chrome and open the inspect tool.

Image for post

We notice that all the individual rows of data are contained in a

with the class ‘container along with other jibberish before and after it. This is good enough for us to scrape it. We can get BeautifulSoup to select the data that has the word inside its class definition anywhere with the * operator like this.

## -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url='https://www.yelp.com/search?cflt=restaurants&find_loc=San Francisco, CA'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
for item in soup.select('[class*=container]'):
    try:
        print(item)
    except Exception as e:
        raise e
        print('')

This prints all the content in each of the containers that hold the restaurant data.

We now can pick out classes inside these rows that contain the data we want. We notice that the title is inside a tag. We select this but also do all other selections under this protective umbrella. This is because in the selection above the class container might be used to contain other things other than the data we want. So to be sure, we make sure that there is a tag in there before we scrape other pieces of data.

## -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url='https://www.yelp.com/search?cflt=restaurants&find_loc=San Francisco, CA' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml') for item in soup.select('[class*=container]'): try: #print(item) if item.find('h4'): name = item.find('h4').get_text() print(name) print('------------------') except Exception as e: raise e print('')

If you run it it will print out all the names.

Image for post

Bingo!! we got the names.

python web-development programming developer data-science

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Basic Data Types in Python | Python Web Development For Beginners

In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.

Hire Python Developers

Are you looking for experienced, reliable, and qualified Python developers? If yes, you have reached the right place. At **[HourlyDeveloper.io](https://hourlydeveloper.io/ "HourlyDeveloper.io")**, our full-stack Python development services...

Applied Data Science with Python Certification Training Course -IgmGuru

Master Applied Data Science with Python and get noticed by the top Hiring Companies with IgmGuru's Data Science with Python Certification Program. Enroll Now

Hire Python Developers India

Looking to build robust, scalable, and dynamic responsive websites and applications in Python? At **[HourlyDeveloper.io](https://hourlydeveloper.io/ "HourlyDeveloper.io")**, we constantly endeavor to give you exactly what you need. If you need to...

Data Science With Python | Python For Data Science | Data Science For Beginners

This Data Science with Python Tutorial will help you understand what is Data Science, basics of Python for data analysis, why learn Python, how to install Python, Python libraries for data analysis, exploratory analysis using Pandas, introduction to series and dataframe, loan prediction problem, data wrangling using Pandas, building a predictive model using Scikit-Learn and implementing logistic regression model using Python.