Introduction To Crawling JavaScript

Historically search engine bots such as Googlebot didn’t crawl and index content created dynamically using JavaScript and were only able to see what was in the static HTML source code.

However, there’s been a huge growth in JavaScript use, and frameworks such as AngularJS, React, Vue.JS, single page applications (SPAs) and progressive web apps (PWAs).

This has meant Google in particular has evolved significantly,  deprecating their old AJAX crawling scheme guidelines of escaped-fragment #! URLs and HTML snapshots in October ’15, and are now generally able to render and understand web pages like a modern-day browser.

While Google are generally able to crawl and index most JavaScript content, they recommend  server-side rendering, pre-rendering or dynamic rendering rather than relying on client-side JavaScript as its ‘difficult to process JavaScript and not all search engine crawlers are able to process it successfully or immediately’.

It’s  essential today to be able to read the DOM after JavaScript has come into play and constructed the web page and understand the differences between the original response HTML, when crawling and evaluating websites.

Traditional website crawlers were not able to crawl JavaScript websites, until we launched the first ever  JavaScript rendering functionality into our Screaming  Frog  SEO Spider software. This meant pages were fully rendered in a browser first, and the rendered HTML post-JavaScript is crawled.

We integrated the Chromium project library for our rendering engine to emulate Google as closely as possible.

Screaming Frog SEO SpiderIn 2019 Google updated their  web rendering service (WRS) which was previously based on Chrome 41 to be  ‘evergreen’ and use the latest, stable version of Chrome – supporting over 1,000 more features.

The SEO Spider uses a slightly earlier version of Chrome, version 69 at the time of writing, but we recommend viewing the exact version within the app by clicking ‘Help > Debug’ and scrolling down to the ‘Chrome Version’ line as we update this frequently.

Hence, while rendering will obviously be similar, it won’t be exactly the same as there might be some small differences in supported features (there are arguments that the exact version of Chrome itself won’t be exactly the same, either). However, generally, the WRS supports the same web platform features and capabilities that the Chrome version it uses, and you can compare the differences between Chrome versions at  CanIUse.com.

This guide contains the following 3 sections. Click and jump to a relevant section, or continue reading.

  1. 1)  Why You Shouldn’t Crawl Blindly With JavaScript Enabled
  2. 2)  How To Identify JavaScript
  3. 3)  How To Crawl JavaScript Websites

If you already understand the basic principles of JavaScript and just want to crawl a JavaScript website, skip straight to our guide on configuring the Screaming  Frog  SEO Spider tool to  crawl JavaScript sites. Or, read on.

How To Crawl JavaScript Video

If you prefer video, then check out our tutorial on crawling JavaScript.

#javascript

How to Crawl JavaScript Websites
20.85 GEEK