Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping. Here we will useBeautiful Soup 4.
Web Scraping is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.
There are mainly two ways to extract data from a website:
2 . Access the HTML of the webpage and extract useful information/data from it. This technique is called web scraping or web harvesting or web data extraction.
This table summarizes the advantages and disadvantages of each parser library.
To install this library in Python Environment can be done by using** _pip _**command. Also install other support i.e. lxml, html5lib, requests etc.
pip install lxml
pip install html5lib
pip install beautifulsoup4
pip install requests
#beautifulsoup #web-scraping #machine-learning #python #data-science