Dylan  Iqbal

Dylan Iqbal

1558795764

Build a web scraper with Node

Web scraping refers to the process of gathering information from a website through automated scripts. This eases the process of gathering large amounts of data from websites where no official API has been defined.

The process of web scraping can be broken down into two main steps:

  1. Fetching the HTML source code of the website through an HTTP request or by using a headless browser.
  2. Parsing the raw data to extract just the information you’re interested in.

We’ll examine both steps during the course of this tutorial. At the end of it all, you should be able to build a web scraper for any website with ease.


Prerequisites

To complete this tutorial, you need to have Node.js (version 8.x or later) and npm installed on your computer. This page contains instructions on how on how to install or upgrade your Node installation to the latest version.


Getting started

Create a new scraper directory for this tutorial and initialize it with a package.json file by running npm init -y from the project root.

Next, install the dependencies that we’ll be needing too build up the web scraper:

    npm install axios cheerio puppeteer --save

Here’s what each one does:

  • Axios: Promise-based HTTP client for Node.js and the browser
  • Cheerio: jQuery implementation for Node.js. Cheerio makes it easy to select, edit, and view DOM elements.
  • Puppeteer: A Node.js library for controlling Google Chrome or Chromium.

You may need to wait a bit for the installation to complete as the puppeteer package needs to download Chromium as well.


Scrap a static website with Axios and Cheerio

To demonstrate how you can scrape a website using Node.js, we’re going to set up a script to scrape the Premier League website for some player stats. Specifically, we’ll scrape the website for the top 20 goalscorers in Premier League history and organize the data as JSON.

Create a new pl-scraper.js file in the root of your project directory and populate it with the following code:

    // pl-scraper.js

const axios = require('axios');

const url = 'https://www.premierleague.com/stats/top/players/goals?se=-1&cl=-1&iso=-1&po=-1?se=-1';

axios(url)
  .then(response => {
    const html = response.data;
    console.log(html);
  })
  .catch(console.error);

If you run the code with node pl-scraper.js, a long string of HTML will be printed to the console. But how can you parse the HTML for the exact data you need? That’s where Cheerio comes in.

Cheerio allows us to use jQuery methods to parse an HTML string and extract whatever information we want from it. But before you write any code, let’s examine the exact data that we need through the browser dev tools.

Open this link in your browser, and open the dev tools on that page. Use the inspector tool to highlight the body of the table listing the top goalscorers in Premier League history.

As you can see the table body has a class of .statsTableContainer. We can select all the rows using cheerio like this: $(‘.statsTableContainer > tr’). Go ahead and update the pl-scraper.js file to look like this:

    // pl-scraper.js

const axios = require('axios');
const cheerio = require('cheerio');

const url = 'https://www.premierleague.com/stats/top/players/goals?se=-1&cl=-1&iso=-1&po=-1?se=-1';

axios(url)
  .then(response => {
    const html = response.data;
    const $ = cheerio.load(html);
    const statsTable = $('.statsTableContainer > tr');
    console.log(statsTable.length);
  })
  .catch(console.error);

Unlike jQuery which operates on the browser DOM, you need to pass in the HTML document into Cheerio before we can use it to parse the document with it. After loading the HTML, we select all 20 rows in .statsTableContainer and store a reference to the selection in statsTable. You can run the code with node pl-scraper.js and confirm that the length of statsTable is exactly 20.

The next step is to extract the rank, player name, nationality and number of goals from each row. We can achieve that using the following script:

    // pl-scraper.js

const axios = require('axios');
const cheerio = require('cheerio');

const url = 'https://www.premierleague.com/stats/top/players/goals?se=-1&cl=-1&iso=-1&po=-1?se=-1';

axios(url)
  .then(response => {
    const html = response.data;
    const $ = cheerio.load(html)
    const statsTable = $('.statsTableContainer > tr');
    const topPremierLeagueScorers = [];

    statsTable.each(function () {
      const rank = $(this).find('.rank > strong').text();
      const playerName = $(this).find('.playerName > strong').text();
      const nationality = $(this).find('.playerCountry').text();
      const goals = $(this).find('.mainStat').text();

      topPremierLeagueScorers.push({
        rank,
        name: playerName,
        nationality,
        goals,
      });
    });

    console.log(topPremierLeagueScorers);
  })
  .catch(console.error);

Here, we are looping over the selection of rows and using the find() method to extract the data that we need, organize it and store it in an array. Now, we have an array of JavaScript objects that can be consumed anywhere else.


Scrape a dynamic website using Puppeteer

Some websites rely exclusively on JavaScript to load their content, so using an HTTP request library like axios to request the HTML will not work because it will not wait for any JavaScript to execute like a browser would before returning a response.

This is where Puppeteer comes in. It is a library that allows you to control a headless browser from a Node.js script. A perfect use case for this library is scraping pages that require JavaScript execution.

Let’s examine how Puppeteer can help us scrape news headlines from r/news since the newer version of Reddit requires JavaScript to render content on the page.

It appears, the headlines are wrapped in an anchor tag that links to the discussion on that headline. Although the class names have been obfuscated, we can select each headline by targeting each h2 inside any anchor tag that links to the discussion page.

Create a new reddit-scraper.js file and add the following code into it:

    // reddit-scraper.js

const cheerio = require('cheerio');
const puppeteer = require('puppeteer');

const url = 'https://www.reddit.com/r/news/';

puppeteer
  .launch()
  .then(browser => browser.newPage())
  .then(page => {
    return page.goto(url).then(function() {
      return page.content();
    });
  })
  .then(html => {
    const $ = cheerio.load(html);
    const newsHeadlines = [];
    $('a[href*="/r/news/comments"] > h2').each(function() {
      newsHeadlines.push({
        title: $(this).text(),
      });
    });

    console.log(newsHeadlines);
  })
  .catch(console.error);

This code launches a puppeteer instance, navigates to the provided URL, and returns the HTML content after all the JavaScript on the page has bee executed. We then use Cheerio as before to parse and extract the desired data from the HTML string.


Wrap up

In this tutorial, we learned how to set up web scraping in Node.js. We looked at scraping methods for both static and dynamic websites, so you should have no issues scraping data off of any website you desire.

You can find the complete source code used for this tutorial in this GitHub repository.


Learn More

The Complete Node.js Developer Course (2nd Edition)

Learn and Understand NodeJS

How to build RESTful APIs with ASP.NET Core

5 Javascript (ES6+) features that you should be using in 2019

MEAN Stack Tutorial MongoDB, ExpressJS, AngularJS and NodeJS

Originally published by Ayooluwa Isaiah at https://pusher.com

#node-js #web-development

What is GEEK

Buddha Community

Build a web scraper with Node

Rajib Mahmud

1553627744

Its Good but when I want the Club name from that static site what should i do ?

There are the nested classes .hide-s and statNameSecondary

I used the following code and got empty string 😢

Node JS Development Company| Node JS Web Developers-SISGAIN

Top organizations and start-ups hire Node.js developers from SISGAIN for their strategic software development projects in Illinois, USA. On the off chance that you are searching for a first rate innovation to assemble a constant Node.js web application development or a module, Node.js applications are the most appropriate alternative to pick. As Leading Node.js development company, we leverage our profound information on its segments and convey solutions that bring noteworthy business results. For more information email us at hello@sisgain.com

#node.js development services #hire node.js developers #node.js web application development #node.js development company #node js application

Evolution in Web Design: A Case Study of 25 Years - Prismetric

The term web design simply encompasses a design process related to the front-end design of website that includes writing mark-up. Creative web design has a considerable impact on your perceived business credibility and quality. It taps onto the broader scopes of web development services.

Web designing is identified as a critical factor for the success of websites and eCommerce. The internet has completely changed the way businesses and brands operate. Web design and web development go hand-in-hand and the need for a professional web design and development company, offering a blend of creative designs and user-centric elements at an affordable rate, is growing at a significant rate.

In this blog, we have focused on the different areas of designing a website that covers all the trends, tools, and techniques coming up with time.

Web design
In 2020 itself, the number of smartphone users across the globe stands at 6.95 billion, with experts suggesting a high rise of 17.75 billion by 2024. On the other hand, the percentage of Gen Z web and internet users worldwide is up to 98%. This is not just a huge market but a ginormous one to boost your business and grow your presence online.

Web Design History
At a huge particle physics laboratory, CERN in Switzerland, the son of computer scientist Barner Lee published the first-ever website on August 6, 1991. He is not only the first web designer but also the creator of HTML (HyperText Markup Language). The worldwide web persisted and after two years, the world’s first search engine was born. This was just the beginning.

Evolution of Web Design over the years
With the release of the Internet web browser and Windows 95 in 1995, most trading companies at that time saw innumerable possibilities of instant worldwide information and public sharing of websites to increase their sales. This led to the prospect of eCommerce and worldwide group communications.

The next few years saw a soaring launch of the now-so-famous websites such as Yahoo, Amazon, eBay, Google, and substantially more. In 2004, by the time Facebook was launched, there were more than 50 million websites online.

Then came the era of Google, the ruler of all search engines introducing us to search engine optimization (SEO) and businesses sought their ways to improve their ranks. The world turned more towards mobile web experiences and responsive mobile-friendly web designs became requisite.

Let’s take a deep look at the evolution of illustrious brands to have a profound understanding of web design.

Here is a retrospection of a few widely acclaimed brands over the years.

Netflix
From a simple idea of renting DVDs online to a multi-billion-dollar business, saying that Netflix has come a long way is an understatement. A company that has sent shockwaves across Hollywood in the form of content delivery. Abundantly, Netflix (NFLX) is responsible for the rise in streaming services across 190 countries and meaningful changes in the entertainment industry.

1997-2000

The idea of Netflix was born when Reed Hastings and Marc Randolph decided to rent DVDs by mail. With 925 titles and a pay-per-rental model, Netflix.com debuts the first DVD rental and sales site with all novel features. It offered unlimited rentals without due dates or monthly rental limitations with a personalized movie recommendation system.

Netflix 1997-2000

2001-2005

Announcing its initial public offering (IPO) under the NASDAQ ticker NFLX, Netflix reached over 1 million subscribers in the United States by introducing a profile feature in their influential website design along with a free trial allowing members to create lists and rate their favorite movies. The user experience was quite engaging with the categorization of content, recommendations based on history, search engine, and a queue of movies to watch.

Netflix 2001-2005 -2003

2006-2010

They then unleashed streaming and partnering with electronic brands such as blu-ray, Xbox, and set-top boxes so that users can watch series and films straight away. Later in 2010, they also launched their sophisticated website on mobile devices with its iconic red and black themed background.

Netflix 2006-2010 -2007

2011-2015

In 2013, an eye-tracking test revealed that the users didn’t focus on the details of the movie or show in the existing interface and were perplexed with the flow of information. Hence, the professional web designers simply shifted the text from the right side to the top of the screen. With Daredevil, an audio description feature was also launched for the visually impaired ones.

Netflix 2011-2015

2016-2020

These years, Netflix came with a plethora of new features for their modern website design such as AutoPay, snippets of trailers, recommendations categorized by genre, percentage based on user experience, upcoming shows, top 10 lists, etc. These web application features yielded better results in visual hierarchy and flow of information across the website.

Netflix 2016-2020

2021

With a sleek logo in their iconic red N, timeless black background with a ‘Watch anywhere, Cancel anytime’ the color, the combination, the statement, and the leading ott platform for top video streaming service Netflix has overgrown into a revolutionary lifestyle of Netflix and Chill.

Netflix 2021

Contunue to read: Evolution in Web Design: A Case Study of 25 Years

#web #web-design #web-design-development #web-design-case-study #web-design-history #web-development

The  NineHertz

The NineHertz

1612352512

node js web development company

The NineHertz is one of the renowned Node js development companies which assures to deliver jaw-dropping solutions for your venture. Our experienced Node js developers work on the advanced and next-gen web applications with trending Node js technology.

Hurry to own a highly authorized and cost-effective real-time web application. Our Node js developers maintain the coding standards which makes the outstanding outcome. Use advanced Node js for web development in no time.

#node.js development companies #node js web development company #node js web development services #node development company

Aria Barnes

Aria Barnes

1622719015

Why use Node.js for Web Development? Benefits and Examples of Apps

Front-end web development has been overwhelmed by JavaScript highlights for quite a long time. Google, Facebook, Wikipedia, and most of all online pages use JS for customer side activities. As of late, it additionally made a shift to cross-platform mobile development as a main technology in React Native, Nativescript, Apache Cordova, and other crossover devices. 

Throughout the most recent couple of years, Node.js moved to backend development as well. Designers need to utilize a similar tech stack for the whole web project without learning another language for server-side development. Node.js is a device that adjusts JS usefulness and syntax to the backend. 

What is Node.js? 

Node.js isn’t a language, or library, or system. It’s a runtime situation: commonly JavaScript needs a program to work, however Node.js makes appropriate settings for JS to run outside of the program. It’s based on a JavaScript V8 motor that can run in Chrome, different programs, or independently. 

The extent of V8 is to change JS program situated code into machine code — so JS turns into a broadly useful language and can be perceived by servers. This is one of the advantages of utilizing Node.js in web application development: it expands the usefulness of JavaScript, permitting designers to coordinate the language with APIs, different languages, and outside libraries.

What Are the Advantages of Node.js Web Application Development? 

Of late, organizations have been effectively changing from their backend tech stacks to Node.js. LinkedIn picked Node.js over Ruby on Rails since it took care of expanding responsibility better and decreased the quantity of servers by multiple times. PayPal and Netflix did something comparative, just they had a goal to change their design to microservices. We should investigate the motivations to pick Node.JS for web application development and when we are planning to hire node js developers. 

Amazing Tech Stack for Web Development 

The principal thing that makes Node.js a go-to environment for web development is its JavaScript legacy. It’s the most well known language right now with a great many free devices and a functioning local area. Node.js, because of its association with JS, immediately rose in ubiquity — presently it has in excess of 368 million downloads and a great many free tools in the bundle module. 

Alongside prevalence, Node.js additionally acquired the fundamental JS benefits: 

  • quick execution and information preparing; 
  • exceptionally reusable code; 
  • the code is not difficult to learn, compose, read, and keep up; 
  • tremendous asset library, a huge number of free aides, and a functioning local area. 

In addition, it’s a piece of a well known MEAN tech stack (the blend of MongoDB, Express.js, Angular, and Node.js — four tools that handle all vital parts of web application development). 

Designers Can Utilize JavaScript for the Whole Undertaking 

This is perhaps the most clear advantage of Node.js web application development. JavaScript is an unquestionable requirement for web development. Regardless of whether you construct a multi-page or single-page application, you need to know JS well. On the off chance that you are now OK with JavaScript, learning Node.js won’t be an issue. Grammar, fundamental usefulness, primary standards — every one of these things are comparable. 

In the event that you have JS designers in your group, it will be simpler for them to learn JS-based Node than a totally new dialect. What’s more, the front-end and back-end codebase will be basically the same, simple to peruse, and keep up — in light of the fact that they are both JS-based. 

A Quick Environment for Microservice Development 

There’s another motivation behind why Node.js got famous so rapidly. The environment suits well the idea of microservice development (spilling stone monument usefulness into handfuls or many more modest administrations). 

Microservices need to speak with one another rapidly — and Node.js is probably the quickest device in information handling. Among the fundamental Node.js benefits for programming development are its non-obstructing algorithms.

Node.js measures a few demands all at once without trusting that the first will be concluded. Many microservices can send messages to one another, and they will be gotten and addressed all the while. 

Versatile Web Application Development 

Node.js was worked in view of adaptability — its name really says it. The environment permits numerous hubs to run all the while and speak with one another. Here’s the reason Node.js adaptability is better than other web backend development arrangements. 

Node.js has a module that is liable for load adjusting for each running CPU center. This is one of numerous Node.js module benefits: you can run various hubs all at once, and the environment will naturally adjust the responsibility. 

Node.js permits even apportioning: you can part your application into various situations. You show various forms of the application to different clients, in light of their age, interests, area, language, and so on. This builds personalization and diminishes responsibility. Hub accomplishes this with kid measures — tasks that rapidly speak with one another and share a similar root. 

What’s more, Node’s non-hindering solicitation handling framework adds to fast, letting applications measure a great many solicitations. 

Control Stream Highlights

Numerous designers consider nonconcurrent to be one of the two impediments and benefits of Node.js web application development. In Node, at whatever point the capacity is executed, the code consequently sends a callback. As the quantity of capacities develops, so does the number of callbacks — and you end up in a circumstance known as the callback damnation. 

In any case, Node.js offers an exit plan. You can utilize systems that will plan capacities and sort through callbacks. Systems will associate comparable capacities consequently — so you can track down an essential component via search or in an envelope. At that point, there’s no compelling reason to look through callbacks.

 

Final Words

So, these are some of the top benefits of Nodejs in web application development. This is how Nodejs is contributing a lot to the field of web application development. 

I hope now you are totally aware of the whole process of how Nodejs is really important for your web project. If you are looking to hire a node js development company in India then I would suggest that you take a little consultancy too whenever you call. 

Good Luck!

Original Source

#node.js development company in india #node js development company #hire node js developers #hire node.js developers in india #node.js development services #node.js development

The  NineHertz

The NineHertz

1611828639

Node JS Development Company | Hire Node.js Developers

The NineHertz promises to develop a pro-active and easy solution for your enterprise. It has reached the heights in Node js web development and is considered as one of the top-notch Node js development company across the globe.

The NineHertz aims to design a best Node js development solution to improve their branding status and business profit.

Looking to hire the leading Node js development company?

#node js development company #nodejs development company #node.js development company #node.js development companies #node js web development #node development company