Joseph  Norton

Joseph Norton


How to Perform Web-Scraping using Node.js

In this post, we’ll learn how to use Node.js and friends to perform a quick and effective web-scraping for single-page applications. This can help us gather and use valuable data which isn’t always available via APIs.

Let’s dive in.

What is web scraping?

Web scraping is a technique used to extract data from websites using a script. Web scraping is the way to automate the laborious work of copying data from various websites.

Web Scraping is generally performed in the cases when the desirable websites don’t expose the API for fetching the data. Some common web scraping scenarios are:

  1. Scraping emails from various websites for sales leads.
  2. Scraping news headlines from news websites.
  3. Scraping product’s data from E-Commerce websites.

Why do we need web scraping when e-commerce websites expose the API (Product Advertising APIs) for fetching/collecting product’s data?

E-Commerce websites only expose some of their product’s data to be fetched through APIs therefore, web scraping is the more effective way to collect the maximum product’s data.

Product comparison sites generally do web scraping. Even Google Search Engine does crawling and scraping to index the search results.

What will we need?

Getting started with web scraping is easy and it is divided into two simple parts-

  1. Scraping emails from various websites for sales leads.
  2. Scraping news headlines from news websites.
  3. Scraping product’s data from E-Commerce websites.

We will also use two open-source npm modules:

  • axios— Promise based HTTP client for the browser and node.js.
  • cheerio — jQuery for Node.js. Cheerio makes it easy to select, edit, and view DOM elements.


Our setup is pretty simple. We create a new folder and run this command inside that folder to create a package.json file. Let’s cook the recipe to make our food delicious.

npm init -y

Before we start cooking, let’s collect the ingredients for our recipe. Add Axios and Cheerio from npm as our dependencies.

npm install axios cheerio

Now, require them in our index.js file

const axios = require('axios');
const cheerio = require('cheerio');

Make the Request

We are done with collecting the ingredients for our food, let’s start with the cooking. We are scraping data from the HackerNews website for which we need to make an HTTP request to get the website’s content. That’s where axios come into action.

Our response will look like this —

<html op="news">
<meta name="referrer" content="origin">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" type="text/css" href="news.css?oq5SsJ3ZDmp6sivPZMMb">
<link rel="shortcut icon" href="favicon.ico">
<link rel="alternate" type="application/rss+xml" title="RSS" href="rss">
<title>Hacker News</title>
<table id="hnmain" border="0" cellpadding="0" cellspacing="0" width="85%" bgcolor="#f6f6ef">
<script type='text/javascript' src='hn.js?oq5SsJ3ZDmp6sivPZMMb'></script>

We are getting similar HTML content which we get while making a request from Chrome or any browser. Now we need some help of Chrome Developer Tools to search through the HTML of a web page and select the required data. You can learn more about the Chrome DevTools from here.

We want to scrape the News heading and its associated links. You can view the HTML of the webpage by right-clicking anywhere on the webpage and selecting “Inspect”.

Parsing HTML with Cheerio.js

Cheerio is the jQuery for Node.js, we use selectors to select tags of an HTML document. The selector syntax was borrowed from jQuery. Using Chrome DevTools, we need to find selector for news headlines and its link. Let’s add some spices to our food.

First, we need to load in the HTML. This step in jQuery is implicit since jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document. After loading the HTML, we iterate all the occurrences of the table row to scrape each and every news on the page.

The Output will look like —

    title: 'Malaysia seeks $7.5B in reparations from Goldman Sachs (',
    link: ''
    title: 'The World Through the Eyes of the US (',
    link: ''

Now we have an array of JavaScript Object containing the title and links of the news from the HackerNews website. In this way, we can scrape the data from various large number of websites. So, our food is prepared and looks delicious too.

Now, websites are more dynamic in nature i.e content is rendered through javascript. For an instance, let’s make a request to any SPA website like I have taken this vue admin template website and disable javascript using Chrome DevTools, this is the response I got —

<html lang="tr">

  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width,initial-scale=1">
  <link href="" rel="stylesheet">
  <link href="" rel="stylesheet">
  <link rel="shortcut icon" type="image/png" href="static/logo.png">
  <title>Vue Admin Template</title>
  <link href="static/css/app.48932a774ea0a6c077e59d0ff4b2bfab.css" rel="stylesheet">

<body data-gr-c-s-loaded="true" cz-shortcut-listen="true">
  <div id="app"></div>
  <script type="text/javascript" src="static/js/manifest.2ae2e69a05c33dfc65f8.js">
  <script type="text/javascript" src="static/js/vendor.ef9a9a02a332d6b5e705.js">
  <script type="text/javascript" src="static/js/app.3ec8db3ca107f46c3715.js">



So, the actual content that we need to scrape will be rendered within the div#app element through javascript so methods that we used to scrape data in the previous post fails as we need something that can run the javascript files similar to our browsers.

We will use web automation tools and libraries like Selenium, Cypress, Nightmare, Puppeteer, x-ray, Headless browsers like phantomjs etc.

Why these tools and libraries? These tools and libraries are generally used by Software testers in tech industries for software testing. All of these opens a browser instance in which our website can run just like other browsers and hence executes javascript files which render the dynamic content of the website.

What will we need?

As I explained in my previous post that web scraping is divided into two simple parts —

  1. Scraping emails from various websites for sales leads.
  2. Scraping news headlines from news websites.
  3. Scraping product’s data from E-Commerce websites.

We will be using Node.js and browser automation library:

  • axios— Promise based HTTP client for the browser and node.js.
  • cheerio — jQuery for Node.js. Cheerio makes it easy to select, edit, and view DOM elements.


Our setup is pretty simple. We create a new folder and run this command inside that folder to create a package.json file. And add Nightmare and Cheerio from npm as our dependencies.

npm init -y
npm install nightmare cheerio --unsafe-perm=true

Now, require them in our index.js file

const Nightmare = require('nightmare');
const cheerio = require('cheerio');

Nightmare is similar to axios or any other request making library but what makes it odd is that it uses Electron under the cover, which is similar to PhantomJS but roughly twice as fast and more modern. Nightmare uses Javascript for handing/manipulating DOM through evaluate function which is complex to implement. So, we will use Cheerio for handling DOM content by fetching innerHTML through evaluate function and pass the content (innerHTML) to Cheerio which is easy, fast, and flexible to implement. Moreover, Nightmare also supports proxies, promises and async-await.

Scrape the static content

We have already scrapped the static content in the previous post, but before proceeding before, let’s do the same with Nightmare so we can have a wider vision and clear understanding of nightmare.

We are scraping data from the HackerNews website for which we need to make an HTTP request to get the website’s content and parse the data using cheerio.

At the first line, we initialize the nightmare and set the show property true so we can monitor what the browser is doing on execution. We made a request to url through goto function and wait for the DOM to be rendered through wait function else it executes the next followup steps without rendering of the content completely. Once the body is loaded completely, we fetch the innerHTML using evaluate function and return the data. At last, we close the browser by calling end function.

You can use try/catch blocks but if you are using promises, .then()must be called after .end() to run the .end() task.

Scrape the dynamic content

Now, we are familiar with nightmare and how it works. So, let’s try to scrape content from any website which uses javascript to render data. Apart from it, nightmare also interact with websites like we can make clicks, fill the forms, so let’s do this. We will scrape the products listed on the website but only those which appears in the search for nodejs books

For interacting with the webpage, we need to use click and type function. Let’s write some code.

We are making a request to the Flipkart website and enter nodejs books in the search bar by selecting the appropriate HTML selector using type function. Selectors can be found by inspecting the HTML using Chrome DevTools. After that, we will click on the search button using click function. On clicking, It loads the requested content similar to any other browsers and we will fetch the innerHTML for scraping content as described in the second part using cheerio.

The Output will look like —

    title: 'Node.js Design Patterns',
    link: '/node-js-design-patterns/p/itmfbg2fhurfy97n?pid=9781785885587&lid=LSTBOK9781785885587OWNSQH&marketplace=FLIPKART&srno=s_1_1&otracker=search&fm=SEARCH&iid=1b09be55-0fad-44c8-a8ad-e38656f3f1b0.9781785885587.SEARCH&ppt=Homepage&ppn=Homepage&ssid=dsiscm68uo0000001547989841622&qH=ecbb3518fc5ee1c0'
    title: 'Advanced Node.js Development',
    link: '/advanced-node-js-development/p/itmf4eg8asapzfeq?pid=9781788393935&lid=LSTBOK9781788393935UKCC4O&marketplace=FLIPKART&srno=s_1_2&otracker=search&fm=SEARCH&iid=b7ec6a1b-4e79-4117-9618-085b894a11dd.9781788393935.SEARCH&ppt=Homepage&ppn=Homepage&ssid=dsiscm68uo0000001547989841622&qH=ecbb3518fc5ee1c0'
    title: 'Deploying Node.js',
    link: '/deploying-node-js/p/itmehyfmeqdbxwg5?pid=9781783981403&lid=LSTBOK9781783981403BYQFE4&marketplace=FLIPKART&srno=s_1_3&otracker=search&fm=SEARCH&iid=7824aa08-f590-4b8d-96ec-4df08cc1a4bb.9781783981403.SEARCH&ppt=Homepage&ppn=Homepage&ssid=dsiscm68uo0000001547989841622&qH=ecbb3518fc5ee1c0'


Now we have an array of JavaScript Object containing the title and links of the products(books) from the Flipkart website. In this way, we can scrape the data from any dynamic websites.


In this article, we first understood what is dynamic websites and how we can scrape data using nightmare and cheerio regardless of the type of website. The code is available in this Github Repo. Feel free to contribute and open issues.

Originally published by Ankit Jain at

#node-js #web-development

What is GEEK

Buddha Community

How to Perform Web-Scraping using Node.js

NBB: Ad-hoc CLJS Scripting on Node.js


Not babashka. Node.js babashka!?

Ad-hoc CLJS scripting on Node.js.


Experimental. Please report issues here.

Goals and features

Nbb's main goal is to make it easy to get started with ad hoc CLJS scripting on Node.js.

Additional goals and features are:

  • Fast startup without relying on a custom version of Node.js.
  • Small artifact (current size is around 1.2MB).
  • First class macros.
  • Support building small TUI apps using Reagent.
  • Complement babashka with libraries from the Node.js ecosystem.


Nbb requires Node.js v12 or newer.

How does this tool work?

CLJS code is evaluated through SCI, the same interpreter that powers babashka. Because SCI works with advanced compilation, the bundle size, especially when combined with other dependencies, is smaller than what you get with self-hosted CLJS. That makes startup faster. The trade-off is that execution is less performant and that only a subset of CLJS is available (e.g. no deftype, yet).


Install nbb from NPM:

$ npm install nbb -g

Omit -g for a local install.

Try out an expression:

$ nbb -e '(+ 1 2 3)'

And then install some other NPM libraries to use in the script. E.g.:

$ npm install csv-parse shelljs zx

Create a script which uses the NPM libraries:

(ns script
  (:require ["csv-parse/lib/sync$default" :as csv-parse]
            ["fs" :as fs]
            ["path" :as path]
            ["shelljs$default" :as sh]
            ["term-size$default" :as term-size]
            ["zx$default" :as zx]
            ["zx$fs" :as zxfs]
            [nbb.core :refer [*file*]]))

(prn (path/resolve "."))

(prn (term-size))

(println (count (str (fs/readFileSync *file*))))

(prn (sh/ls "."))

(prn (csv-parse "foo,bar"))

(prn (zxfs/existsSync *file*))

(zx/$ #js ["ls"])

Call the script:

$ nbb script.cljs
#js {:columns 216, :rows 47}
#js ["node_modules" "package-lock.json" "package.json" "script.cljs"]
#js [#js ["foo" "bar"]]
$ ls


Nbb has first class support for macros: you can define them right inside your .cljs file, like you are used to from JVM Clojure. Consider the plet macro to make working with promises more palatable:

(defmacro plet
  [bindings & body]
  (let [binding-pairs (reverse (partition 2 bindings))
        body (cons 'do body)]
    (reduce (fn [body [sym expr]]
              (let [expr (list '.resolve 'js/Promise expr)]
                (list '.then expr (list 'clojure.core/fn (vector sym)

Using this macro we can look async code more like sync code. Consider this puppeteer example:

(-> (.launch puppeteer)
      (.then (fn [browser]
               (-> (.newPage browser)
                   (.then (fn [page]
                            (-> (.goto page "")
                                (.then #(.screenshot page #js{:path "screenshot.png"}))
                                (.catch #(js/console.log %))
                                (.then #(.close browser)))))))))

Using plet this becomes:

(plet [browser (.launch puppeteer)
       page (.newPage browser)
       _ (.goto page "")
       _ (-> (.screenshot page #js{:path "screenshot.png"})
             (.catch #(js/console.log %)))]
      (.close browser))

See the puppeteer example for the full code.

Since v0.0.36, nbb includes promesa which is a library to deal with promises. The above plet macro is similar to promesa.core/let.

Startup time

$ time nbb -e '(+ 1 2 3)'
nbb -e '(+ 1 2 3)'   0.17s  user 0.02s system 109% cpu 0.168 total

The baseline startup time for a script is about 170ms seconds on my laptop. When invoked via npx this adds another 300ms or so, so for faster startup, either use a globally installed nbb or use $(npm bin)/nbb script.cljs to bypass npx.


NPM dependencies

Nbb does not depend on any NPM dependencies. All NPM libraries loaded by a script are resolved relative to that script. When using the Reagent module, React is resolved in the same way as any other NPM library.


To load .cljs files from local paths or dependencies, you can use the --classpath argument. The current dir is added to the classpath automatically. So if there is a file foo/bar.cljs relative to your current dir, then you can load it via (:require [ :as fb]). Note that nbb uses the same naming conventions for namespaces and directories as other Clojure tools: foo-bar in the namespace name becomes foo_bar in the directory name.

To load dependencies from the Clojure ecosystem, you can use the Clojure CLI or babashka to download them and produce a classpath:

$ classpath="$(clojure -A:nbb -Spath -Sdeps '{:aliases {:nbb {:replace-deps {com.github.seancorfield/honeysql {:git/tag "v2.0.0-rc5" :git/sha "01c3a55"}}}}}')"

and then feed it to the --classpath argument:

$ nbb --classpath "$classpath" -e "(require '[honey.sql :as sql]) (sql/format {:select :foo :from :bar :where [:= :baz 2]})"
["SELECT foo FROM bar WHERE baz = ?" 2]

Currently nbb only reads from directories, not jar files, so you are encouraged to use git libs. Support for .jar files will be added later.

Current file

The name of the file that is currently being executed is available via nbb.core/*file* or on the metadata of vars:

(ns foo
  (:require [nbb.core :refer [*file*]]))

(prn *file*) ;; "/private/tmp/foo.cljs"

(defn f [])
(prn (:file (meta #'f))) ;; "/private/tmp/foo.cljs"


Nbb includes reagent.core which will be lazily loaded when required. You can use this together with ink to create a TUI application:

$ npm install ink


(ns ink-demo
  (:require ["ink" :refer [render Text]]
            [reagent.core :as r]))

(defonce state (r/atom 0))

(doseq [n (range 1 11)]
  (js/setTimeout #(swap! state inc) (* n 500)))

(defn hello []
  [:> Text {:color "green"} "Hello, world! " @state])

(render (r/as-element [hello]))


Working with callbacks and promises can become tedious. Since nbb v0.0.36 the promesa.core namespace is included with the let and do! macros. An example:

(ns prom
  (:require [promesa.core :as p]))

(defn sleep [ms]
   (fn [resolve _]
     (js/setTimeout resolve ms))))

(defn do-stuff
   (println "Doing stuff which takes a while")
   (sleep 1000)

(p/let [a (do-stuff)
        b (inc a)
        c (do-stuff)
        d (+ b c)]
  (prn d))
$ nbb prom.cljs
Doing stuff which takes a while
Doing stuff which takes a while

Also see API docs.


Since nbb v0.0.75 applied-science/js-interop is available:

(ns example
  (:require [applied-science.js-interop :as j]))

(def o (j/lit {:a 1 :b 2 :c {:d 1}}))

(prn (j/select-keys o [:a :b])) ;; #js {:a 1, :b 2}
(prn (j/get-in o [:c :d])) ;; 1

Most of this library is supported in nbb, except the following:

  • destructuring using :syms
  • property access using .-x notation. In nbb, you must use keywords.

See the example of what is currently supported.


See the examples directory for small examples.

Also check out these projects built with nbb:


See API documentation.

Migrating to shadow-cljs

See this gist on how to convert an nbb script or project to shadow-cljs.



  • babashka >= 0.4.0
  • Clojure CLI >=
  • Node.js 16.5.0 (lower version may work, but this is the one I used to build)

To build:

  • Clone and cd into this repo
  • bb release

Run bb tasks for more project-related tasks.

Download Details:
Author: borkdude
Download Link: Download The Source Code
Official Website: 
License: EPL-1.0

#node #javascript

Aria Barnes

Aria Barnes


Why use Node.js for Web Development? Benefits and Examples of Apps

Front-end web development has been overwhelmed by JavaScript highlights for quite a long time. Google, Facebook, Wikipedia, and most of all online pages use JS for customer side activities. As of late, it additionally made a shift to cross-platform mobile development as a main technology in React Native, Nativescript, Apache Cordova, and other crossover devices. 

Throughout the most recent couple of years, Node.js moved to backend development as well. Designers need to utilize a similar tech stack for the whole web project without learning another language for server-side development. Node.js is a device that adjusts JS usefulness and syntax to the backend. 

What is Node.js? 

Node.js isn’t a language, or library, or system. It’s a runtime situation: commonly JavaScript needs a program to work, however Node.js makes appropriate settings for JS to run outside of the program. It’s based on a JavaScript V8 motor that can run in Chrome, different programs, or independently. 

The extent of V8 is to change JS program situated code into machine code — so JS turns into a broadly useful language and can be perceived by servers. This is one of the advantages of utilizing Node.js in web application development: it expands the usefulness of JavaScript, permitting designers to coordinate the language with APIs, different languages, and outside libraries.

What Are the Advantages of Node.js Web Application Development? 

Of late, organizations have been effectively changing from their backend tech stacks to Node.js. LinkedIn picked Node.js over Ruby on Rails since it took care of expanding responsibility better and decreased the quantity of servers by multiple times. PayPal and Netflix did something comparative, just they had a goal to change their design to microservices. We should investigate the motivations to pick Node.JS for web application development and when we are planning to hire node js developers. 

Amazing Tech Stack for Web Development 

The principal thing that makes Node.js a go-to environment for web development is its JavaScript legacy. It’s the most well known language right now with a great many free devices and a functioning local area. Node.js, because of its association with JS, immediately rose in ubiquity — presently it has in excess of 368 million downloads and a great many free tools in the bundle module. 

Alongside prevalence, Node.js additionally acquired the fundamental JS benefits: 

  • quick execution and information preparing; 
  • exceptionally reusable code; 
  • the code is not difficult to learn, compose, read, and keep up; 
  • tremendous asset library, a huge number of free aides, and a functioning local area. 

In addition, it’s a piece of a well known MEAN tech stack (the blend of MongoDB, Express.js, Angular, and Node.js — four tools that handle all vital parts of web application development). 

Designers Can Utilize JavaScript for the Whole Undertaking 

This is perhaps the most clear advantage of Node.js web application development. JavaScript is an unquestionable requirement for web development. Regardless of whether you construct a multi-page or single-page application, you need to know JS well. On the off chance that you are now OK with JavaScript, learning Node.js won’t be an issue. Grammar, fundamental usefulness, primary standards — every one of these things are comparable. 

In the event that you have JS designers in your group, it will be simpler for them to learn JS-based Node than a totally new dialect. What’s more, the front-end and back-end codebase will be basically the same, simple to peruse, and keep up — in light of the fact that they are both JS-based. 

A Quick Environment for Microservice Development 

There’s another motivation behind why Node.js got famous so rapidly. The environment suits well the idea of microservice development (spilling stone monument usefulness into handfuls or many more modest administrations). 

Microservices need to speak with one another rapidly — and Node.js is probably the quickest device in information handling. Among the fundamental Node.js benefits for programming development are its non-obstructing algorithms.

Node.js measures a few demands all at once without trusting that the first will be concluded. Many microservices can send messages to one another, and they will be gotten and addressed all the while. 

Versatile Web Application Development 

Node.js was worked in view of adaptability — its name really says it. The environment permits numerous hubs to run all the while and speak with one another. Here’s the reason Node.js adaptability is better than other web backend development arrangements. 

Node.js has a module that is liable for load adjusting for each running CPU center. This is one of numerous Node.js module benefits: you can run various hubs all at once, and the environment will naturally adjust the responsibility. 

Node.js permits even apportioning: you can part your application into various situations. You show various forms of the application to different clients, in light of their age, interests, area, language, and so on. This builds personalization and diminishes responsibility. Hub accomplishes this with kid measures — tasks that rapidly speak with one another and share a similar root. 

What’s more, Node’s non-hindering solicitation handling framework adds to fast, letting applications measure a great many solicitations. 

Control Stream Highlights

Numerous designers consider nonconcurrent to be one of the two impediments and benefits of Node.js web application development. In Node, at whatever point the capacity is executed, the code consequently sends a callback. As the quantity of capacities develops, so does the number of callbacks — and you end up in a circumstance known as the callback damnation. 

In any case, Node.js offers an exit plan. You can utilize systems that will plan capacities and sort through callbacks. Systems will associate comparable capacities consequently — so you can track down an essential component via search or in an envelope. At that point, there’s no compelling reason to look through callbacks.


Final Words

So, these are some of the top benefits of Nodejs in web application development. This is how Nodejs is contributing a lot to the field of web application development. 

I hope now you are totally aware of the whole process of how Nodejs is really important for your web project. If you are looking to hire a node js development company in India then I would suggest that you take a little consultancy too whenever you call. 

Good Luck!

Original Source

#node.js development company in india #node js development company #hire node js developers #hire node.js developers in india #node.js development services #node.js development

Node JS Development Company| Node JS Web Developers-SISGAIN

Top organizations and start-ups hire Node.js developers from SISGAIN for their strategic software development projects in Illinois, USA. On the off chance that you are searching for a first rate innovation to assemble a constant Node.js web application development or a module, Node.js applications are the most appropriate alternative to pick. As Leading Node.js development company, we leverage our profound information on its segments and convey solutions that bring noteworthy business results. For more information email us at

#node.js development services #hire node.js developers #node.js web application development #node.js development company #node js application

Hire Dedicated Node.js Developers - Hire Node.js Developers

If you look at the backend technology used by today’s most popular apps there is one thing you would find common among them and that is the use of NodeJS Framework. Yes, the NodeJS framework is that effective and successful.

If you wish to have a strong backend for efficient app performance then have NodeJS at the backend.

WebClues Infotech offers different levels of experienced and expert professionals for your app development needs. So hire a dedicated NodeJS developer from WebClues Infotech with your experience requirement and expertise.

So what are you waiting for? Get your app developed with strong performance parameters from WebClues Infotech

For inquiry click here:

Book Free Interview:

#hire dedicated node.js developers #hire node.js developers #hire top dedicated node.js developers #hire node.js developers in usa & india #hire node js development company #hire the best node.js developers & programmers

The  NineHertz

The NineHertz


Node JS Development Company | Hire Node.js Developers

The NineHertz promises to develop a pro-active and easy solution for your enterprise. It has reached the heights in Node js web development and is considered as one of the top-notch Node js development company across the globe.

The NineHertz aims to design a best Node js development solution to improve their branding status and business profit.

Looking to hire the leading Node js development company?

#node js development company #nodejs development company #node.js development company #node.js development companies #node js web development #node development company