Level Up Your Asynchronous JavaScript Skills by Implementing a Bluebird-Style Promise.map

Level Up Your Asynchronous JavaScript Skills by Implementing a Bluebird-Style Promise.map

Level Up Your Asynchronous JavaScript Skills by Implementing a Bluebird-Style Promise.map. Better asynchronicity.I do a lot of web scraping with Node.js in my work. Usually, you do not want to fire all your API calls in one go, as doing so is likely to overwhelm other people’s servers, trigger their DDoS protection, or worse, take them offline.

I do a lot of web scraping with Node.js in my work. Usually, you do not want to fire all your API calls in one go, as doing so is likely to overwhelm other people’s servers, trigger their DDoS protection, or worse, take them offline.

Data scraping on the web is usually done in two steps:

  1. You visit an index page where you find a “listing” of all the sub-resources you can call to fetch some details. Take the example of a property portal. The website might organize the property information by location or every property might have its own page.
  2. One by one, you will visit the items on the list from step 1 to fetch out the details.

It is certainly a bad idea to wrap all the step 2 API calls in a giant Promise.all. The correct approach is to use staggering and/or a rate limit. JavaScript’s Promise lends itself well to staggering because it is easy to create waterfall behavior from Promise. You simply .then all your invocations so they occur one after another. For example:

    Promise.waterfall = function (array, invoke) {
      let pending = Promise.resolve()
      const results = []

      for (const item of array) {
        pending = pending
          .then(() => invoke(item, i))
          .then(result => results.push(result))
      }

      return pending.then(() => results)
    }

To rate limit, you can introduce a timed delay:

function delay (invoke, ms) {
      return (...args) => new Promise(resolve => {
        setTimeout(resolve, ms)
      }).then(() => invoke.apply(...args))
    }

    function delay2 (invoke, ms) {
      return (...args) => new Promise(resolve => {
        setTimeout(resolve, ms, invoke(...args))
      })
    }

    /* Usage
    Promise.waterfall(array, delay(invoke, 1000))
    Promise.waterfall(array, delay2(invoke, 1000))
    */

javascript nodejs web-scraping promises programming

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

What's the Link Between Web Automation and Web Proxies?

Web automation and web scraping are quite popular among people out there. That’s mainly because people tend to use web scraping and other similar automation technologies to grab information they want from the internet. The internet can be considered as one of the biggest sources of information. If we can use that wisely, we will be able to scrape lots of important facts. However, it is important for us to use appropriate methodologies to get the most out of web scraping. That’s where proxies come into play.

JavaScript Promise: Methods Comparison

What are promises and what is the difference between Promise.all, Promise.allSettled, Promise.race and Promise.any?

Promises in JavaScript

What is a Promise?

How to Web Scrape ANYTHING with JavaScript (course out now!!)

How to Web Scrape ANYTHING with JavaScript (course out now!!)

Puppeteer.js: Web Scraping with a Headless Browser

What exactly is Puppeteer? It's a Node.js library which provides a high-level API to control headless Chrome or Chromium or to interact with the DevTools protocol. Web development heavily relies on testing mechanisms for the quality checks before we push them into the production environment. A complex…