In today’s world where there are lot of websites available,analysing the data in them becomes difficult without an efficient method.Here is where web scrapping comes into play.Web scraping is the process that automates the extraction of data from websites in fast and efficient manner.Among the various python libraries available for web scraping,BeautifulSoup is one among them.
BeautifulSoup is a library for parsing HTML and XML documents creating a parse tree thus allowing us to extract and analyse data.
Installing Beautiful Soup
Beautifulsoup can be installed using BeautifulSoup installed using the Python package manager pip.
pip install BeautifulSoup4
It can also be installed using the anaconda package manager.
conda install beautifulsoup4
Let’s look at an example of scrapping wikipedia that extracts the names of the states in India
To access the HTML content of a webpage,
1.Import python’s requests and BeautifulSoup libraries
2.Provide the website url needed for scraping
3.Get the HTML data by performing a HTTP request to the specified URL and store the response in an object
from bs4 import BeautifulSoup
import requests
url='https://en.wikipedia.org/wiki/States_and_union_territories_of_India'
response = requests.get(url)
#python #web-scraping #data-collection