What thoughts come to mind when you come across 404/Page Not Found/Dead Hyperlinks on a website? Aargh! You would find it annoying when you come across broken hyperlinks, which is the sole reason why you should continuously focus on removing the existence of broken links in your web product (or website). Instead of a manual inspection, you can leverage automation for broken link testing using Selenium WebDriver.

When a particular link is broken and a visitor lands on the page, it affects that page’s functionality and results in a poor user experience. Dead links could hurt your product’s credibility, as it ‘might’ give an impression to your visitors that there is a minimal focus on the experience.

If your web product has many pages (or links) that result in a 404 error (or page not found), the product rankings on search engines (e.g., Google) will also be badly affected. Removal of dead links is one of the integral parts of SEO (Search Engine Optimization) activity.

In this part of the Selenium WebDriver tutorial series, we deep dive into finding broken links using Selenium WebDriver. We have demonstrated broken link testing using Selenium Python, Selenium Java, Selenium C#, and Selenium PHP.

Introduction to Broken Links in Web Testing

In simple terms, broken links (or dead links) in a website (or web app) are links that are not reachable and do not work as anticipated. The links could be temporarily down due to server issues or wrongly configured at the back end.

Apart from pages that result in 404 error, other prominent examples of broken links are malformed URLs, links to content (e.g., documents, pdf, images, etc.) that have been moved or deleted.

Prominent Reasons for Broken Links

Here are some of the common reasons behind the occurrence of broken links (dead links or link rots):

  • Incorrect or misspelled URL entered by the user.
  • Structural changes in the website (i.e., permalinks) with URL redirects or internal redirects are not properly configured.
  • Links to content like videos, documents, etc. that are either moved or deleted. If the content is moved, the ‘internal links’ should be redirected to the designated links.
  • Temporary website downtime due to site maintenance making the website temporarily inaccessible.
  • Broken HTML tags, JavaScript errors, incorrect HTML/CSS customizations, broken embedded elements, etc., within the page leading, can lead to broken links.
  • Geolocation restrictions prevent access to the website from certain IP addresses (if they are blacklisted) or specific countries in the world. Geolocation testing with Selenium helps ensure that the experience is tailor-made for the location (or country) from where the site is accessed.

Why should you check Broken Links?

Broken links are a big turn-off for the visitors who land on your website. Here are some of the major reasons why you should check for broken links on your website:

  • Broken Links can hurt the user experience.
  • Removal of broken (or dead) links is essential for SEO (Search Engine Optimization), as it can affect the site’s rankings on search engines (e.g., Google).

Broken links testing can be done using Selenium WebDriver on a web page, which in turn can be used to remove the site’s dead links.

Broken Links and HTTP Status Codes

When a user visits a website, a request is sent by the browser to the site’s server. The server responds to the browser’s request with a three-digit code called the ‘HTTP Status Code.’

An HTTP Status Code is the server’s response to a request sent from the web browser. These HTTP Status Codes are considered equivalent to the conversation between the browser (from which URL request is sent) and the server.

Though different HTTP Status Codes are used for different purposes, most of the codes are useful for diagnosing issues in the site, minimizing site downtime, the number of dead links, and more. The first digit of every three-digit status code begins with numbers 1~5. The status codes are represented as 1xx, 2xx…, 5xx for indicating the status codes in that particular range. As each of these ranges consists of a different class of server response, we would limit the discussion to HTTP Status Codes presented for broken links.

#selenium webdriver #selenium testing #selenium automated testing

How To Find Broken Links Using Selenium WebDriver
1.15 GEEK