A Guide to Web Scraping With JavaScript and Node.js

A Guide to Web Scraping With JavaScript and Node.js

In this JavaScript Web Scraping tutorial, we’re going to illustrate how to perform web scraping with JavaScript and Node.js. Since JavaScript is excellent at manipulating the DOM (Document Object Model) inside a web browser, creating data extraction scripts in Node.js can be extremely versatile. We’ll start by demonstrating how to use the Axios and Cheerio packages to extract data from a simple website. Then, we’ll show how to use a headless browser, Puppeteer, to retrieve data from a dynamic website that loads content via JavaScript.

With the massive increase in the volume of data on the Internet, this technique is becoming increasingly beneficial in retrieving information from websites and applying them for various use cases. Typically, web data extraction involves making a request to the given web page, accessing its HTML code, and parsing that code to harvest some information. Since JavaScript is excellent at manipulating the DOM (Document Object Model) inside a web browser, creating data extraction scripts in Node.js can be extremely versatile. Hence, this tutorial focuses on JavaScript web scraping.

In this article, we’re going to illustrate how to perform web scraping with JavaScript and Node.js.

We’ll start by demonstrating how to use the Axios and Cheerio packages to extract data from a simple website.

Then, we’ll show how to use a headless browser, Puppeteer, to retrieve data from a dynamic website that loads content via JavaScript.

What you’ll need

  • Web browser
  • A web page to extract data from
  • Code editor such as Visual Studio Code
  • Node.js
  • Axios
  • Cheerio
  • Puppeteer

Ready?

Let’s begin getting our hands dirty…

Getting Started

Installing Node.js

Node.js is a popular JavaScript runtime environment that comes with lots of features for automating the laborious task of gathering data from websites.

To install it on your system, follow the download instructions available on its website here. npm (the Node Package Manager) will also be installed automatically alongside Node.js.

npm is the default package management tool for Node.js. Since we’ll be using packages to simplify web scraping, npm will make the process of consuming them fast and painless.

After installing Node.js, go to your project’s root directory and run the following command to create a package.json file, which will contain all the details relevant to the project:

npm init

Installing Axios

Axios is a robust promise-based HTTP client that can be deployed both in Node.js and the web browser. With this npm package, you can make HTTP requests from Node.js using promises, and download data from the Internet easily and fast.

Furthermore, Axios automatically transforms data into JSON format, intercepts requests and responses, and can handle multiple concurrent requests.

To install it, navigate to your project’s directory folder in the terminal, and run the following command:

npm install axios

By default, NPM will install Axios in a folder named node_modules, which will be automatically created in your project’s directory.

Installing Cheerio

Cheerio is an efficient and lean module that provides jQuery-like syntax for manipulating the content of web pages. It greatly simplifies the process of selecting, editing, and viewing DOM elements on a web page.

While Cheerio allows you to parse and manipulate the DOM easily, it does not work the same way as a web browser. This implies that it doesn’t take requests, execute JavaScript, load external resources, or apply CSS styling.

To install it, navigate to your project’s directory folder in the terminal, and run the following command:

npm install cheerio 

By default, just like Axios, npm will install Cheerio in a folder named node_modules, which will be automatically created in your project’s directory.

Installing Puppeteer

Puppeteer is a Node library that allows you to control a headless Chrome browser programmatically and extract data smoothly and fast.

Since some websites rely on JavaScript to load their content, using an HTTP-based tool like Axios may not yield the intended results. With Puppeteer, you can simulate the browser environment, execute JavaScript just like a browser does, and scrape dynamic content from websites.

To install it, just like the other packages, navigate to your project’s directory folder in the terminal, and run the following command:

npm install puppeteer

Scraping a simple website

Now let’s see how we can use Axios and Cheerio to extract data from a simple website.

For this tutorial, our target will be this web page. We’ll be seeking to extract the number of comments listed on the top section of the page.

To find the specific HTML elements that hold the data we are looking for, let’s use the inspector tool on our web browser:

As you can see on the image above, the number of comments data is enclosed in an

<a>

tag, which is a child of the

<span>

tag with a class of

comment-bubble

. We’ll use this information when using Cheerio to select these elements on the page.

Here are the steps for creating the scraping logic:

1. Let’s start by creating a file called index.js that will contain the programming logic for retrieving data from the web page.

2. Then, let’s use the require function, which is built-in within Node.js, to include the modules we’ll use in the project.

const axios = require('axios');
const cheerio = require('cheerio');

3. Let’s use Axios to make a GET HTTP request to the target web page.

Here is the code:

 axios.get('https://www.forextradingbig.com/instaforex- 
    broker-review/')
       .then(response => {
          const html = response.data;      
       })

Notice that when a request is sent to the web page, it returns a response. This Axios response object is made up of various components, including data that refers to the payload returned from the server.

So, when a GET request is made, we output the data from the response, which is in HTML format.

4. Next, let’s load the response data into a Cheerio instance. This way, we can create a Cheerio object to help us in parsing through the HTML from the target web page and finding the DOM elements for the data we want—just like when using jQuery.

To uphold the infamous jQuery convention, we’ll name the Cheerio object

$
const $ = cheerio.load(html);

5. Let’s use the Cheerio’s selectors syntax to search the elements containing the data we want:

const scrapedata = $('a', '.comment-bubble').text()
console.log(scrapedata);

Notice that we also used the text() method to output the data in a text format.

6. Finally, let’s log any errors experienced during the scraping process.

.catch( error => {
    console.log(error);
}); 

Here is the entire code for the scraping logic:

const axios = require("axios");
const cheerio = require("cheerio");
//performing a GET request
axios
  .get("https://www.forextradingbig.com/instaforex-broker-review/")
  .then((response) => {
    //handling the success
    const html = response.data;

    //loading response data into a Cheerio instance
    const $ = cheerio.load(html);

    //selecting the elements with the data
    const scrapedata = $("a", ".comment-bubble").text();

    //outputting the scraped data
    console.log(scrapedata);
  })
  //handling error
  .catch((error) => {
    console.log(error);
  });

If we run the above code with the node index.js command, it returns the information we wanted to scrape from the target web page.

Here is a screenshot of the results:

It worked!

javascript node web-development programming developer

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Hire Node.JS Developers | Skenix Infotech

We are providing robust Node.JS Development Services with expert Node.js Developers. Get affordable Node.JS Web Development services from Skenix Infotech.

The Ultimate Guide to Web Scraping With JavaScript and Node.js

We’re going to illustrate how to perform web scraping with JavaScript and Node.js. Learn how to use the Axios and Cheerio packages to extract data from a simple website. Learn how to use a headless browser, Puppeteer, to retrieve data from a dynamic website that loads content via JavaScript.

Why Web Development is Important for your Business

With the rapid development in technology, the old ways to do business have changed completely. A lot more advanced and developed ways are ...

Important Reasons to Hire a Professional Web Development Company

    You name the business and I will tell you how web development can help you promote your business. If it is a startup or you seeking some...

JavaScript Full Course - Beginner's Guide to JavaScript on Node.js

This complete 51-part JavaScript tutorial for beginners will teach you everything you need to know to get started with the JavaScript on Node.js. JavaScript Full Course - Beginner's Guide to JavaScript on Node.js