Selenium for web scraping

Selenium is a browser automation library. Most often used for testing web-applications, Selenium may be used for any task that requires automating interaction with the browser. This can include web-scraping.

The following tutorial will be a user-led guide of best practices for web-scraping using selenium. I have listed my 5 top tips that will help the user scrape any data they request as efficiently as possible, using as little code as possible.

Aim:

To extract the Coronavirus global headlines from BBC news.

Prerequisites

To follow this tutorial, you will need to:

  1. Download a selenium webdriver.
  2. pip install selenium

5 Selenium Best Practise Tips

Tip 1: Place the webdriver executable in PATH

To begin our web scraping task, we must first navigate to the following page, ‘ https://www.bbc.co.uk/news’. This step can be achieved in as little as three lines of code. First we import the webdriver from selenium, create an instance of the chrome webdriver, and finally call the get method on the webdriver object named driver.

To make this code short and readable, the chromedriver executable can be placed in a user chosen folder. This destination can then be add to PATH under your environmental variables. The webdriver is then ready to go, simply using webdriver.Chrome() with no arguments passed to Chrome in the parentheses.

#data-science #machine-learning #selenium

5 Top Tips for Data Scraping using Selenium
2.70 GEEK