Web Scraping using Python and Selenium.

Web Scraping using Python and Selenium.

The modern way to scrap. Web Scraping Using Selenium — Python. How to navigate through multiple pages of a website and scrape large amounts of data using Selenium in Python ...

Web scraping has been used to extract data from websites almost from the time the World Wide Web was born. In the early days, scraping was mainly done on static pages — those with known elements, tags, and data.

More recently, however, advanced technologies in web development have made the task a bit more difficult. In this article, we’ll explore how we might go about scraping data in the case that new technology and other factors prevent standard scraping.

Traditional Data Scraping

As most websites produce pages meant for human readability rather than automated reading, web scraping mainly consisted of programmatically digesting a web page’s mark-up data (think right-click, View Source), then detecting static patterns in that data that would allow the program to “read” various pieces of information and save it to a file or a database.

Image for post

Courtesy of the author.

If report data were to be found, often, the data would be accessible by passing either form variables or parameters with the URL. For example:

https://www.myreportdata.com?month=12&year=2004&clientid=24823

Python has become one of the most popular web scraping languages due in part to the various web libraries that have been created for it. One popular library, Beautiful Soup, is designed to pull data out of HTML and XML files by allowing searching, navigating, and modifying tags (i.e., the parse tree).

Browser-based Scraping

Recently, I had a scraping project that seemed pretty straightforward and I was fully prepared to use traditional scraping to handle it. But as I got further into it, I found obstacles that could not be overcome with traditional methods.

Three main issues prevented me from my standard scraping methods:

  1. Certificate. There was a certificate required to be installed to access the portion of the website where the data was. When accessing the initial page, a prompt appeared asking me to select the proper certificate of those installed on my computer, and click OK.
  2. Iframes. The site used iframes, which messed up my normal scraping. Yes, I could try to find all iframe URLs, then build a sitemap, but that seemed like it could get unwieldy.
  3. JavaScript. The data was accessed after filling in a form with parameters (e.g., customer ID, date range, etc.). Normally, I would bypass the form and simply pass the form variables (via URL or as hidden form variables) to the result page and see the results. But in this case, the form contained JavaScript, which didn’t allow me to access the form variables in a normal fashion.

selenium crawling scraping-with-python python web-scraping

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Web Scraping with Selenium

This is the third part of a 4 part tutorial series on web scraping using Scrapy and Selenium. You can reach part-1 by clicking here and part-2 by clicking here. These two parts dealt with web scraping using Scrapy.

Web Scraping Made Simple using Selenium Web Driver and Python

Web Scraping Made Simple using Selenium Web Driver and Python - ‘Buy the rumor, sell the news’ is a popular saying in stock market where the stocks rise in anticipation of some major news and then…

A Beginner's Guide to Web Scraping in Python

In this article, you're going to learn the basics of web scraping in python and we'll do a demo project to scrape quotes from a website.

AutoScraper Introduction: Fast and Light Automatic Web Scraper for Python

This project is made for automatic web scraping to make scraping easy. It gets a url or the html content of a web page and a list of sample data which we want to scrape from that page. This data can be text, url or any html tag value of that page.

Using Python and Selenium to Scrape Infinite Scroll Web Pages

Web scraping can be an important tool for data collection.In this story, I will show the python code I developed to auto-scrolling web pages, and demonstrate how to use it to scrape URLs in Reddit as an example.