Introduction

In this tutorial, I will show you how to build a web scraper with Python libraries. Web scraping is a great skill to have and is a requirement in certain data-related job listings. It is also a great way for you to demonstrate your ability to collect your own data, instead of just downloading a pre-existing dataset from the Internet.

By the end of this article, you should be familiar with:

  • What is web scraping?
  • Why use Python for web scraping?
  • Python libraries — BeautifulSoup and Selenium
  • Pulling data from any website, and storing it in a dataframe

Let’s get started!

Note: As long as data is publicly available and not hidden behind a login, it is legal to scrape. All data scraped in this tutorial is publicly available, and is legal to scrape.

What is web scraping?

Web scraping is an automated process of gathering data from the Internet.

Picture this: You are a data science enthusiast, and are looking to find some of the best online courses. You visit an online course review site, and scroll through hundreds of courses, trying to find the most suitable one for you.

Sounds exhausting, doesn’t it?

Instead of spending hours trying to do this manually, you can simply run a few lines of code to pull all this data and store it in a data frame.

Then, you can proceed to sort or query this data to find a course that matches your expectations.

This saves a lot of time and effort. Even if you aren’t someone who is familiar with programming, learning to automate the boring stuff with code is a great skill to have.

That is exactly what we will be doing in this tutorial. Each step will be explained, and all the code will be provided.

You will be able to use the same codes and techniques for other web scraping tasks in the future.

#web-scraping #data-science #selenium #python

Build a Web Scraper with Python in 8 Minutes
5.70 GEEK