Web Scraping using BeautifulSoup- COVID-19 Data

Web Scraping using BeautifulSoup- COVID-19 Data

Web scraping is extracting large amounts of unstructured data from websites and storing it in a structured format in a desired file/database. We’ll see how it’s done in this blog.

So how do you scrape data from the web?

Have you ever copied and pasted information from websites?

If yes, I would say you’ve already performed web-scraping in a way. But you can’t really copy and paste for say about a 100 times or even more, can you?

So let’s see how Python helps us do the same with the help of one of it’s packages – BeautifulSoup.

  • Step 1- Find a website that contains the information you require.

Some websites like Twitter and Facebook provide APIs for easy connectivity and access to their data. But some don’t, so you’ll have to write a code to navigate through and extract it’s content.

Remember, not every website is cool with you scraping their content. So make sure you’re aware about the website’s terms and conditions.

You can take a look at the website’s permissions by appending it’s URL with ‘/robots.txt’.

robots.txt_ file is known as the Robots Exclusion Protocol._

We’ll scrape-

Number of COVID-19 Cases for each Country-

from https://en.wikipedia.org/wiki/COVID-19_pandemic.

  • Step 2- Inspect the website.

It’s important for you to know the site’s structure to extract information that you’re interested in. Find out the html tags in which data that needs to be scraped is present.

Right click on the website and then click on inspect.

To understand and inspect the content, you need to know few HTML tags that are commonly used.

_ headings |

paragraphs_

_ hyperlinks |

_ tables | table rows | table cells_

These tags can further have attributes like class, id, src, title, etc.

Inspecting the website mentioned earlier, highlighted in pink are the tags we’ll be extracting data from.

  • Step 3- Get the site’s HTML code in your Python script.

We’ll use the requests library to send an HTTP request to the website. The server will respond with HTML content of the page.

import requests 
response = requests.get("https://en.wikipedia.org/wiki/COVID-19_pandemic")

Let’s check if the request was successful or not.

response.status_code

_Output- _200

Status code starting with 2 generally indicates success and codes starting with 4 or 5 indicates an error.

response.content

The response obtained will look similar to the HTML content you inspected.

  • Step 4- Parse HTML data with BeautifulSoup

The HTML content looks complex and confusing due to nested tags and multiple attributes. We now need BeautifulSoup to simplify our task.

BeautifulSoup is a python package for parsing HTML and XML documents. It creates parse trees and makes extracting data easy.

Let’s first import the BeautifulSoup package and create it’s object ‘soup’.

from bs4 import BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
soup.prettify()

prettify()function helps us view the manner in which the tags are nested.

table headers_

_

web-scraping data-science data machine-learning covid19 data analysis

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

Web Scraping using Python To Create a Dataset | Data Science | Machine Learning | Python

In this article I will show you how you can create your own dataset by Web Scraping using Python. Web Scraping means to extract a set of data from web. If you are a programmer, a Data Scientist, Engineer or anyone who works by manipulating the data, the skills of Web Scrapping will help you in your career. Suppose you are working on a project where no data is available, then how you are going to collect the data. In this situation Web Scraping skills will help you.

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

Web Scraping Basics: How to scrape data from a website in Python

We always say “Garbage in Garbage out” in data science. If you do not have a good quality and quantity of data, mostly likely you would not get much insights out of it.

Exploratory Data Analysis is a significant part of Data Science

You will discover Exploratory Data Analysis (EDA), the techniques and tactics that you can use, and why you should be performing EDA on your next problem.