When pitching for a new website job or when a company wants to relaunch their website, one of the most important part of the new concept is the information architecture. Therefore a sitemap of the existing page is very helpful to get a complete overview. But before someone extracts all links and hierarchies manually, we can use NodeJS to crawl the page for us.

In this part of a two posts series we’re going to build a NodeJS crawler which will save all internal links that are found on a webpage into a json file.

We want to start the crawler by a command in our terminal and the domain should be an argument. This ensures that the crawler will be usable for every website we want. Next, it should only crawl pages that it has not seen before. Otherwise it would run forever and it’ll never stop.

Read part 2

#node.js #d3.js #javascript

How to Build a Sitemap with a Node.js crawler and D3.js - Part 1
4.30 GEEK