Elvis Miranda

Elvis Miranda

1558605247

How To Web Scraping With Nodejs Cheerio

What is web scraping?

Web scraping is a technique used to extract data from websites using a script. Web scraping is the way to automate the laborious work of copying data from various websites.

Web Scraping is generally performed in the cases when the desirable websites don’t expose the API for fetching the data. Some common web scraping scenarios are:

  1. Scraping emails from various websites for sales leads.
  2. Scraping news headlines from news websites.
  3. Scraping product’s data from E-Commerce websites.

Why do we need web scraping when e-commerce websites expose the API (Product Advertising APIs) for fetching/collecting product’s data?

E-Commerce websites only expose some of their product’s data to be fetched through APIs therefore, web scraping is the more effective way to collect the maximum product’s data.

Product comparison sites generally do web scraping. Even Google Search Engine does crawling and scraping to index the search results.

What will we need?

Getting started with web scraping is easy and it is divided into two simple parts-

  1. Scraping emails from various websites for sales leads.
  2. Scraping news headlines from news websites.
  3. Scraping product’s data from E-Commerce websites.

We will be using Node.js for web-scraping. If you’re not familiar with Node, check out this article The only NodeJs introduction you’ll ever need.

We will also use two open-source npm modules:

  • axios— Promise based HTTP client for the browser and node.js.
  • cheerio — jQuery for Node.js. Cheerio makes it easy to select, edit, and view DOM elements.

You can learn more about comparing popular HTTP request libraries here.

Tip: Don’t duplicate common code. Use tools like Bit to organize, share and discover components across apps- to build faster. Take a look.

Setup

Our setup is pretty simple. We create a new folder and run this command inside that folder to create a package.json file. Let’s cook the recipe to make our food delicious.

npm init -y

Before we start cooking, let’s collect the ingredients for our recipe. Add Axios and Cheerio from npm as our dependencies.

npm install axios cheerio

Now, require them in our index.js file

const axios = require('axios');
const cheerio = require('cheerio');

Make the Request

We are done with collecting the ingredients for our food, let’s start with the cooking. We are scraping data from the HackerNews website for which we need to make an HTTP request to get the website’s content. That’s where axios come into action.

Our response will look like this —

<html op="news">
<head>
<meta name="referrer" content="origin">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" type="text/css" href="news.css?oq5SsJ3ZDmp6sivPZMMb">
<link rel="shortcut icon" href="favicon.ico">
<link rel="alternate" type="application/rss+xml" title="RSS" href="rss">
<title>Hacker News</title>
</head>
<body>
<center>
<table id="hnmain" border="0" cellpadding="0" cellspacing="0" width="85%" bgcolor="#f6f6ef">
.
.
.
</body>
<script type='text/javascript' src='hn.js?oq5SsJ3ZDmp6sivPZMMb'></script>
</html>

We are getting similar HTML content which we get while making a request from Chrome or any browser. Now we need some help of Chrome Developer Tools to search through the HTML of a web page and select the required data. You can learn more about the Chrome DevTools from here.

We want to scrape the News heading and its associated links. You can view the HTML of the webpage by right-clicking anywhere on the webpage and selecting “Inspect”.

Parsing HTML with Cheerio.js

Cheerio is the jQuery for Node.js, we use selectors to select tags of an HTML document. The selector syntax was borrowed from jQuery. Using Chrome DevTools, we need to find selector for news headlines and its link. Let’s add some spices to our food.

First, we need to load in the HTML. This step in jQuery is implicit since jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document. After loading the HTML, we iterate all the occurrences of the table row to scrape each and every news on the page.

The Output will look like —

[
  {
    title: 'Malaysia seeks $7.5B in reparations from Goldman Sachs (reuters.com)',
    link: 'https://www.reuters.com/article/us-malaysia-politics-1mdb-goldman/malaysia-seeks-7-5-billion-in-reparations-from-goldman-sachs-ft-idUSKCN1OK0GU'
  },
  {
    title: 'The World Through the Eyes of the US (pudding.cool)',
    link: 'https://pudding.cool/2018/12/countries/'
  },
  .
  .
  .
]

Now we have an array of JavaScript Object containing the title and links of the news from the HackerNews website. In this way, we can scrape the data from various large number of websites. So, our food is prepared and looks delicious too.

Conclusion

In this article, we first understood what is web scraping and how we can use it for automating various operations of collecting data from various websites.

Many websites are using Single Page Application (SPA) architecture to generate content dynamically on their websites using JavaScript. We can get the response from the initial HTTP request and can’t execute the javascript for rendering dynamic content using axios and other similar npm packages like request. Hence, we can only scrape data from static websites.

I will soon write an article on “how to scrape data from dynamic websites”. Meanwhile, Cook your own recipes!!

Feel free to comment and ask me anything. You can follow me on Twitter and Medium. Thanks for reading! 👍

Learn more

Node.js for beginners, 10 developed projects, 100% practical

Node.js - From Zero to Web App

Typescript Async/Await in Node JS with testing

Projects in Node.js - Learn by Example

All about NodeJS

What is web scraping?

Web scraping is a technique used to extract data from websites using a script. Web scraping is the way to automate the laborious work of copying data from various websites.

Web Scraping is generally performed in the cases when the desirable websites don’t expose the API for fetching the data. Some common web scraping scenarios are:

  1. Scraping emails from various websites for sales leads.
  2. Scraping news headlines from news websites.
  3. Scraping product’s data from E-Commerce websites.

Why do we need web scraping when e-commerce websites expose the API (Product Advertising APIs) for fetching/collecting product’s data?

E-Commerce websites only expose some of their product’s data to be fetched through APIs therefore, web scraping is the more effective way to collect the maximum product’s data.

Product comparison sites generally do web scraping. Even Google Search Engine does crawling and scraping to index the search results.

What will we need?

Getting started with web scraping is easy and it is divided into two simple parts-

  1. Scraping emails from various websites for sales leads.
  2. Scraping news headlines from news websites.
  3. Scraping product’s data from E-Commerce websites.

We will be using Node.js for web-scraping. If you’re not familiar with Node, check out this article The only NodeJs introduction you’ll ever need.

We will also use two open-source npm modules:

  • axios— Promise based HTTP client for the browser and node.js.
  • cheerio — jQuery for Node.js. Cheerio makes it easy to select, edit, and view DOM elements.

You can learn more about comparing popular HTTP request libraries here.

Tip: Don’t duplicate common code. Use tools like Bit to organize, share and discover components across apps- to build faster. Take a look.

Setup

Our setup is pretty simple. We create a new folder and run this command inside that folder to create a package.json file. Let’s cook the recipe to make our food delicious.

npm init -y

Before we start cooking, let’s collect the ingredients for our recipe. Add Axios and Cheerio from npm as our dependencies.

npm install axios cheerio

Now, require them in our index.js file

const axios = require('axios');
const cheerio = require('cheerio');

Make the Request

We are done with collecting the ingredients for our food, let’s start with the cooking. We are scraping data from the HackerNews website for which we need to make an HTTP request to get the website’s content. That’s where axios come into action.

Our response will look like this —

<html op="news">
<head>
<meta name="referrer" content="origin">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" type="text/css" href="news.css?oq5SsJ3ZDmp6sivPZMMb">
<link rel="shortcut icon" href="favicon.ico">
<link rel="alternate" type="application/rss+xml" title="RSS" href="rss">
<title>Hacker News</title>
</head>
<body>
<center>
<table id="hnmain" border="0" cellpadding="0" cellspacing="0" width="85%" bgcolor="#f6f6ef">
.
.
.
</body>
<script type='text/javascript' src='hn.js?oq5SsJ3ZDmp6sivPZMMb'></script>
</html>

We are getting similar HTML content which we get while making a request from Chrome or any browser. Now we need some help of Chrome Developer Tools to search through the HTML of a web page and select the required data. You can learn more about the Chrome DevTools from here.

We want to scrape the News heading and its associated links. You can view the HTML of the webpage by right-clicking anywhere on the webpage and selecting “Inspect”.

Parsing HTML with Cheerio.js

Cheerio is the jQuery for Node.js, we use selectors to select tags of an HTML document. The selector syntax was borrowed from jQuery. Using Chrome DevTools, we need to find selector for news headlines and its link. Let’s add some spices to our food.

First, we need to load in the HTML. This step in jQuery is implicit since jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document. After loading the HTML, we iterate all the occurrences of the table row to scrape each and every news on the page.

The Output will look like —

[
  {
    title: 'Malaysia seeks $7.5B in reparations from Goldman Sachs (reuters.com)',
    link: 'https://www.reuters.com/article/us-malaysia-politics-1mdb-goldman/malaysia-seeks-7-5-billion-in-reparations-from-goldman-sachs-ft-idUSKCN1OK0GU'
  },
  {
    title: 'The World Through the Eyes of the US (pudding.cool)',
    link: 'https://pudding.cool/2018/12/countries/'
  },
  .
  .
  .
]

Now we have an array of JavaScript Object containing the title and links of the news from the HackerNews website. In this way, we can scrape the data from various large number of websites. So, our food is prepared and looks delicious too.

Conclusion

In this article, we first understood what is web scraping and how we can use it for automating various operations of collecting data from various websites.

Many websites are using Single Page Application (SPA) architecture to generate content dynamically on their websites using JavaScript. We can get the response from the initial HTTP request and can’t execute the javascript for rendering dynamic content using axios and other similar npm packages like request. Hence, we can only scrape data from static websites.

I will soon write an article on “how to scrape data from dynamic websites”. Meanwhile, Cook your own recipes!!

Feel free to comment and ask me anything. You can follow me on Twitter and Medium. Thanks for reading! 👍

Learn more

Node.js for beginners, 10 developed projects, 100% practical

Node.js - From Zero to Web App

Typescript Async/Await in Node JS with testing

Projects in Node.js - Learn by Example

All about NodeJS

#node-js

What is GEEK

Buddha Community

How To Web Scraping With Nodejs  Cheerio
Autumn  Blick

Autumn Blick

1603805749

What's the Link Between Web Automation and Web Proxies?

Web automation and web scraping are quite popular among people out there. That’s mainly because people tend to use web scraping and other similar automation technologies to grab information they want from the internet. The internet can be considered as one of the biggest sources of information. If we can use that wisely, we will be able to scrape lots of important facts. However, it is important for us to use appropriate methodologies to get the most out of web scraping. That’s where proxies come into play.

How Can Proxies Help You With Web Scraping?

When you are scraping the internet, you will have to go through lots of information available out there. Going through all the information is never an easy thing to do. You will have to deal with numerous struggles while you are going through the information available. Even if you can use tools to automate the task and overcome struggles, you will still have to invest a lot of time in it.

When you are using proxies, you will be able to crawl through multiple websites faster. This is a reliable method to go ahead with web crawling as well and there is no need to worry too much about the results that you are getting out of it.

Another great thing about proxies is that they will provide you with the chance to mimic that you are from different geographical locations around the world. While keeping that in mind, you will be able to proceed with using the proxy, where you can submit requests that are from different geographical regions. If you are keen to find geographically related information from the internet, you should be using this method. For example, numerous retailers and business owners tend to use this method in order to get a better understanding of local competition and the local customer base that they have.

If you want to try out the benefits that come along with web automation, you can use a free web proxy. You will be able to start experiencing all the amazing benefits that come along with it. Along with that, you will even receive the motivation to take your automation campaigns to the next level.

#automation #web #proxy #web-automation #web-scraping #using-proxies #website-scraping #website-scraping-tools

Hire NodeJs Developer

Looking to build dynamic, extensively featured, and full-fledged web applications?

Hire NodeJs Developer to create a real-time, faster, and scalable application to accelerate your business. At HourlyDeveloper.io, we have a team of expert Node.JS developers, who have experience in working with Bootstrap, HTML5, & CSS, and also hold the knowledge of the most advanced frameworks and platforms.

Contact our experts: https://bit.ly/3hUdppS

#hire nodejs developer #nodejs developer #nodejs development company #nodejs development services #nodejs development #nodejs

Sival Alethea

Sival Alethea

1624402800

Beautiful Soup Tutorial - Web Scraping in Python

The Beautiful Soup module is used for web scraping in Python. Learn how to use the Beautiful Soup and Requests modules in this tutorial. After watching, you will be able to start scraping the web on your own.
📺 The video in this post was made by freeCodeCamp.org
The origin of the article: https://www.youtube.com/watch?v=87Gx3U0BDlo&list=PLWKjhJtqVAbnqBxcdjVGgT3uVR10bzTEB&index=12
🔥 If you’re a beginner. I believe the article below will be useful to you ☞ What You Should Know Before Investing in Cryptocurrency - For Beginner
⭐ ⭐ ⭐The project is of interest to the community. Join to Get free ‘GEEK coin’ (GEEKCASH coin)!
☞ **-----CLICK HERE-----**⭐ ⭐ ⭐
Thanks for visiting and watching! Please don’t forget to leave a like, comment and share!

#web scraping #python #beautiful soup #beautiful soup tutorial #web scraping in python #beautiful soup tutorial - web scraping in python

Ashish parmar

Ashish parmar

1627043546

Evolution in Web Design: A Case Study of 25 Years - Prismetric

The term web design simply encompasses a design process related to the front-end design of website that includes writing mark-up. Creative web design has a considerable impact on your perceived business credibility and quality. It taps onto the broader scopes of web development services.

Web designing is identified as a critical factor for the success of websites and eCommerce. The internet has completely changed the way businesses and brands operate. Web design and web development go hand-in-hand and the need for a professional web design and development company, offering a blend of creative designs and user-centric elements at an affordable rate, is growing at a significant rate.

In this blog, we have focused on the different areas of designing a website that covers all the trends, tools, and techniques coming up with time.

Web design
In 2020 itself, the number of smartphone users across the globe stands at 6.95 billion, with experts suggesting a high rise of 17.75 billion by 2024. On the other hand, the percentage of Gen Z web and internet users worldwide is up to 98%. This is not just a huge market but a ginormous one to boost your business and grow your presence online.

Web Design History
At a huge particle physics laboratory, CERN in Switzerland, the son of computer scientist Barner Lee published the first-ever website on August 6, 1991. He is not only the first web designer but also the creator of HTML (HyperText Markup Language). The worldwide web persisted and after two years, the world’s first search engine was born. This was just the beginning.

Evolution of Web Design over the years
With the release of the Internet web browser and Windows 95 in 1995, most trading companies at that time saw innumerable possibilities of instant worldwide information and public sharing of websites to increase their sales. This led to the prospect of eCommerce and worldwide group communications.

The next few years saw a soaring launch of the now-so-famous websites such as Yahoo, Amazon, eBay, Google, and substantially more. In 2004, by the time Facebook was launched, there were more than 50 million websites online.

Then came the era of Google, the ruler of all search engines introducing us to search engine optimization (SEO) and businesses sought their ways to improve their ranks. The world turned more towards mobile web experiences and responsive mobile-friendly web designs became requisite.

Let’s take a deep look at the evolution of illustrious brands to have a profound understanding of web design.

Here is a retrospection of a few widely acclaimed brands over the years.

Netflix
From a simple idea of renting DVDs online to a multi-billion-dollar business, saying that Netflix has come a long way is an understatement. A company that has sent shockwaves across Hollywood in the form of content delivery. Abundantly, Netflix (NFLX) is responsible for the rise in streaming services across 190 countries and meaningful changes in the entertainment industry.

1997-2000

The idea of Netflix was born when Reed Hastings and Marc Randolph decided to rent DVDs by mail. With 925 titles and a pay-per-rental model, Netflix.com debuts the first DVD rental and sales site with all novel features. It offered unlimited rentals without due dates or monthly rental limitations with a personalized movie recommendation system.

Netflix 1997-2000

2001-2005

Announcing its initial public offering (IPO) under the NASDAQ ticker NFLX, Netflix reached over 1 million subscribers in the United States by introducing a profile feature in their influential website design along with a free trial allowing members to create lists and rate their favorite movies. The user experience was quite engaging with the categorization of content, recommendations based on history, search engine, and a queue of movies to watch.

Netflix 2001-2005 -2003

2006-2010

They then unleashed streaming and partnering with electronic brands such as blu-ray, Xbox, and set-top boxes so that users can watch series and films straight away. Later in 2010, they also launched their sophisticated website on mobile devices with its iconic red and black themed background.

Netflix 2006-2010 -2007

2011-2015

In 2013, an eye-tracking test revealed that the users didn’t focus on the details of the movie or show in the existing interface and were perplexed with the flow of information. Hence, the professional web designers simply shifted the text from the right side to the top of the screen. With Daredevil, an audio description feature was also launched for the visually impaired ones.

Netflix 2011-2015

2016-2020

These years, Netflix came with a plethora of new features for their modern website design such as AutoPay, snippets of trailers, recommendations categorized by genre, percentage based on user experience, upcoming shows, top 10 lists, etc. These web application features yielded better results in visual hierarchy and flow of information across the website.

Netflix 2016-2020

2021

With a sleek logo in their iconic red N, timeless black background with a ‘Watch anywhere, Cancel anytime’ the color, the combination, the statement, and the leading ott platform for top video streaming service Netflix has overgrown into a revolutionary lifestyle of Netflix and Chill.

Netflix 2021

Contunue to read: Evolution in Web Design: A Case Study of 25 Years

#web #web-design #web-design-development #web-design-case-study #web-design-history #web-development

AutoScraper Introduction: Fast and Light Automatic Web Scraper for Python

In the last few years, web scraping has been one of my day to day and frequently needed tasks. I was wondering if I can make it smart and automatic to save lots of time. So I made AutoScraper!

The project code is available on Github.

This project is made for automatic web scraping to make scraping easy. It gets a url or the html content of a web page and a list of sample data which we want to scrape from that page. This data can be text, url or any html tag value of that page. It learns the scraping rules and returns the similar elements. Then you can use this learned object with new urls to get similar content or the exact same element of those new pages!

Installation

Install latest version from git repository using pip:

$ pip install git+https://github.com/alirezamika/autoscraper.git

How to use

Getting similar results

Say we want to fetch all related post titles in a stackoverflow page:

from autoscraper import AutoScraper

url = 'https://stackoverflow.com/questions/2081586/web-scraping-with-python'

## We can add one or multiple candidates here.
## You can also put urls here to retrieve urls.
wanted_list = ["How to call an external command?"]

scraper = AutoScraper()
result = scraper.build(url, wanted_list)
print(result)

#python #web-scraping #web-crawling #data-scraping #website-scraping #open-source #repositories-on-github #web-development