A fast, simple and easy-to-use Node.js crawler Framework

Nodejs crawler framework

A fast, simple and easy-to-use node.js crawler framework.

WARNING: It is still under development. Any existing features may be changed or removed

characteristic

Access frequency restriction
Pause and resume crawling
Connect to the last record to continue climbing
Built-in jQuery selector
Built-in resource download tool
IP proxy settings
Statistics
Custom HTTP method, request body
Error retry

Quick start

npm install @axetroy/crawler
import { Crawler, Provider, Response } from "@axetroy/crawler";

class MyProvider implements Provider {
  name = "scrapinghub";
  urls = ["https://blog.scrapinghub.com"];
  async parse($: Response) {
    const $nextPage = $("a.next-posts-link").eq(0);

    if ($nextPage) {
      $.follow($nextPage.prop("href"));
    }

    return $(".post-header>h2")
      .map((_, el) => $(el).text())
      .get();
  }
}

const config: Options = {
  timeout: 1000 * 5,
  retry: 3
};

new Crawler(MyProvider, config)
  .on("data", (articles: string[]) => {
    for (const article of articles) {
      process.stdout.write(article + "\n");
    }
  })
  .on("error", (err, task) => {
    console.log(`request fail on ${task.url}: ${err.message}`);
  })
  .start();

API Reference

Example

How to run the demo?

> npx ts-node examples/basic.ts

Here are related examples

Download Details:

Author: axetroy

Live Demo: https://axetroy.github.io/crawler/

GitHub: https://github.com/axetroy/crawler

#nodejs #javascript #programming #node