Introduction

What is Web Scraping?

Web scraping is an automated method used to extract large amounts of data from websites. The data on the websites are unstructured.

Web scraping helps collect these unstructured data and store it in a structured form.

Some other words used for Web Scraping are: Web Crawling, Web Data Extraction, Web Harvesting, etc.

Image for post

After Hearing all that the next question on anyone’s mind would be

Is Web Scraping legal?

In general Web Scraping by itself is not illegal afterall you can scrape a website for educational purposes.

As a general rule of thumb any data obtained from webscraping from sites that do not allow web scraping cannot be used for commercial purposes as it results in violation of law and is therefore illegal.

To know whether a website allows web scraping or not, you can look at the website’s “robots.txt” file.

You can find this file by appending “/robots.txt” to the URL that you want to scrape.

eg. “https://…YOUR URL HERE…./robots.txt”

Now with that out of the way lets get into it.

Overview

In this post we’ll be going through

1. How to inspect & analyze a web page for web scraping

2. How to get started with web scraping

3. Collect Unstructured Data in a Structured Format

After we are through with this post we’ll have an Anime database containing all the Anime currently available on the website, which includes:

1. Anime Title

2. Description

3. Current/Latest Season

4. Episodes Aired

5. Status

6. Initial Air Date

7. Genre

8. Sub/Dub

9. Series/Movie

10. URL

#data-science #web-crawler #web-crawling #data-scraping #database

Creating Anime Database With Web Scraping-Introduction
10.15 GEEK