What is Web Scraping?
Web scraping is an automated method used to extract large amounts of data from websites. The data on the websites are unstructured.
Web scraping helps collect these unstructured data and store it in a structured form.
Some other words used for Web Scraping are: Web Crawling, Web Data Extraction, Web Harvesting, etc.
After Hearing all that the next question on anyone’s mind would be
Is Web Scraping legal?
In general Web Scraping by itself is not illegal afterall you can scrape a website for educational purposes.
As a general rule of thumb any data obtained from webscraping from sites that do not allow web scraping cannot be used for commercial purposes as it results in violation of law and is therefore illegal.
To know whether a website allows web scraping or not, you can look at the website’s “robots.txt” file.
You can find this file by appending “/robots.txt” to the URL that you want to scrape.
eg. “https://…YOUR URL HERE…./robots.txt”
Now with that out of the way lets get into it.
In this post we’ll be going through
1. How to inspect & analyze a web page for web scraping
2. How to get started with web scraping
3. Collect Unstructured Data in a Structured Format
After we are through with this post we’ll have an Anime database containing all the Anime currently available on the website, which includes:
1. Anime Title
2. Description
3. Current/Latest Season
4. Episodes Aired
5. Status
6. Initial Air Date
7. Genre
8. Sub/Dub
9. Series/Movie
10. URL
#data-science #web-crawler #web-crawling #data-scraping #database