1558795764
Web scraping refers to the process of gathering information from a website through automated scripts. This eases the process of gathering large amounts of data from websites where no official API has been defined.
The process of web scraping can be broken down into two main steps:
We’ll examine both steps during the course of this tutorial. At the end of it all, you should be able to build a web scraper for any website with ease.
To complete this tutorial, you need to have Node.js (version 8.x or later) and npm installed on your computer. This page contains instructions on how on how to install or upgrade your Node installation to the latest version.
Create a new scraper
directory for this tutorial and initialize it with a package.json
file by running npm init -y
from the project root.
Next, install the dependencies that we’ll be needing too build up the web scraper:
npm install axios cheerio puppeteer --save
Here’s what each one does:
You may need to wait a bit for the installation to complete as the puppeteer
package needs to download Chromium as well.
To demonstrate how you can scrape a website using Node.js, we’re going to set up a script to scrape the Premier League website for some player stats. Specifically, we’ll scrape the website for the top 20 goalscorers in Premier League history and organize the data as JSON.
Create a new pl-scraper.js
file in the root of your project directory and populate it with the following code:
// pl-scraper.jsconst axios = require('axios'); const url = 'https://www.premierleague.com/stats/top/players/goals?se=-1&cl=-1&iso=-1&po=-1?se=-1'; axios(url) .then(response => { const html = response.data; console.log(html); }) .catch(console.error);
If you run the code with node pl-scraper.js
, a long string of HTML will be printed to the console. But how can you parse the HTML for the exact data you need? That’s where Cheerio comes in.
Cheerio allows us to use jQuery methods to parse an HTML string and extract whatever information we want from it. But before you write any code, let’s examine the exact data that we need through the browser dev tools.
Open this link in your browser, and open the dev tools on that page. Use the inspector tool to highlight the body of the table listing the top goalscorers in Premier League history.
As you can see the table body has a class of .statsTableContainer
. We can select all the rows using cheerio
like this: $(‘.statsTableContainer > tr’)
. Go ahead and update the pl-scraper.js
file to look like this:
// pl-scraper.jsconst axios = require('axios'); const cheerio = require('cheerio'); const url = 'https://www.premierleague.com/stats/top/players/goals?se=-1&cl=-1&iso=-1&po=-1?se=-1'; axios(url) .then(response => { const html = response.data; const $ = cheerio.load(html); const statsTable = $('.statsTableContainer > tr'); console.log(statsTable.length); }) .catch(console.error);
Unlike jQuery which operates on the browser DOM, you need to pass in the HTML document into Cheerio before we can use it to parse the document with it. After loading the HTML, we select all 20 rows in .statsTableContainer
and store a reference to the selection in statsTable
. You can run the code with node pl-scraper.js
and confirm that the length of statsTable
is exactly 20.
The next step is to extract the rank, player name, nationality and number of goals from each row. We can achieve that using the following script:
// pl-scraper.jsconst axios = require('axios'); const cheerio = require('cheerio'); const url = 'https://www.premierleague.com/stats/top/players/goals?se=-1&cl=-1&iso=-1&po=-1?se=-1'; axios(url) .then(response => { const html = response.data; const $ = cheerio.load(html) const statsTable = $('.statsTableContainer > tr'); const topPremierLeagueScorers = []; statsTable.each(function () { const rank = $(this).find('.rank > strong').text(); const playerName = $(this).find('.playerName > strong').text(); const nationality = $(this).find('.playerCountry').text(); const goals = $(this).find('.mainStat').text(); topPremierLeagueScorers.push({ rank, name: playerName, nationality, goals, }); }); console.log(topPremierLeagueScorers); }) .catch(console.error);
Here, we are looping over the selection of rows and using the find()
method to extract the data that we need, organize it and store it in an array. Now, we have an array of JavaScript objects that can be consumed anywhere else.
Some websites rely exclusively on JavaScript to load their content, so using an HTTP request library like axios
to request the HTML will not work because it will not wait for any JavaScript to execute like a browser would before returning a response.
This is where Puppeteer comes in. It is a library that allows you to control a headless browser from a Node.js script. A perfect use case for this library is scraping pages that require JavaScript execution.
Let’s examine how Puppeteer can help us scrape news headlines from r/news since the newer version of Reddit requires JavaScript to render content on the page.
It appears, the headlines are wrapped in an anchor tag that links to the discussion on that headline. Although the class names have been obfuscated, we can select each headline by targeting each h2
inside any anchor tag that links to the discussion page.
Create a new reddit-scraper.js
file and add the following code into it:
// reddit-scraper.jsconst cheerio = require('cheerio'); const puppeteer = require('puppeteer'); const url = 'https://www.reddit.com/r/news/'; puppeteer .launch() .then(browser => browser.newPage()) .then(page => { return page.goto(url).then(function() { return page.content(); }); }) .then(html => { const $ = cheerio.load(html); const newsHeadlines = []; $('a[href*="/r/news/comments"] > h2').each(function() { newsHeadlines.push({ title: $(this).text(), }); }); console.log(newsHeadlines); }) .catch(console.error);
This code launches a puppeteer
instance, navigates to the provided URL, and returns the HTML content after all the JavaScript on the page has bee executed. We then use Cheerio as before to parse and extract the desired data from the HTML string.
In this tutorial, we learned how to set up web scraping in Node.js. We looked at scraping methods for both static and dynamic websites, so you should have no issues scraping data off of any website you desire.
You can find the complete source code used for this tutorial in this GitHub repository.
☞ The Complete Node.js Developer Course (2nd Edition)
☞ How to build RESTful APIs with ASP.NET Core
☞ 5 Javascript (ES6+) features that you should be using in 2019
☞ MEAN Stack Tutorial MongoDB, ExpressJS, AngularJS and NodeJS
Originally published by Ayooluwa Isaiah at https://pusher.com
#node-js #web-development
1553627744
Its Good but when I want the Club name from that static site what should i do ?
There are the nested classes .hide-s and statNameSecondary
I used the following code and got empty string 😢
1616839211
Top organizations and start-ups hire Node.js developers from SISGAIN for their strategic software development projects in Illinois, USA. On the off chance that you are searching for a first rate innovation to assemble a constant Node.js web application development or a module, Node.js applications are the most appropriate alternative to pick. As Leading Node.js development company, we leverage our profound information on its segments and convey solutions that bring noteworthy business results. For more information email us at hello@sisgain.com
#node.js development services #hire node.js developers #node.js web application development #node.js development company #node js application
1627043546
The term web design simply encompasses a design process related to the front-end design of website that includes writing mark-up. Creative web design has a considerable impact on your perceived business credibility and quality. It taps onto the broader scopes of web development services.
Web designing is identified as a critical factor for the success of websites and eCommerce. The internet has completely changed the way businesses and brands operate. Web design and web development go hand-in-hand and the need for a professional web design and development company, offering a blend of creative designs and user-centric elements at an affordable rate, is growing at a significant rate.
In this blog, we have focused on the different areas of designing a website that covers all the trends, tools, and techniques coming up with time.
Web design
In 2020 itself, the number of smartphone users across the globe stands at 6.95 billion, with experts suggesting a high rise of 17.75 billion by 2024. On the other hand, the percentage of Gen Z web and internet users worldwide is up to 98%. This is not just a huge market but a ginormous one to boost your business and grow your presence online.
Web Design History
At a huge particle physics laboratory, CERN in Switzerland, the son of computer scientist Barner Lee published the first-ever website on August 6, 1991. He is not only the first web designer but also the creator of HTML (HyperText Markup Language). The worldwide web persisted and after two years, the world’s first search engine was born. This was just the beginning.
Evolution of Web Design over the years
With the release of the Internet web browser and Windows 95 in 1995, most trading companies at that time saw innumerable possibilities of instant worldwide information and public sharing of websites to increase their sales. This led to the prospect of eCommerce and worldwide group communications.
The next few years saw a soaring launch of the now-so-famous websites such as Yahoo, Amazon, eBay, Google, and substantially more. In 2004, by the time Facebook was launched, there were more than 50 million websites online.
Then came the era of Google, the ruler of all search engines introducing us to search engine optimization (SEO) and businesses sought their ways to improve their ranks. The world turned more towards mobile web experiences and responsive mobile-friendly web designs became requisite.
Let’s take a deep look at the evolution of illustrious brands to have a profound understanding of web design.
Here is a retrospection of a few widely acclaimed brands over the years.
Netflix
From a simple idea of renting DVDs online to a multi-billion-dollar business, saying that Netflix has come a long way is an understatement. A company that has sent shockwaves across Hollywood in the form of content delivery. Abundantly, Netflix (NFLX) is responsible for the rise in streaming services across 190 countries and meaningful changes in the entertainment industry.
1997-2000
The idea of Netflix was born when Reed Hastings and Marc Randolph decided to rent DVDs by mail. With 925 titles and a pay-per-rental model, Netflix.com debuts the first DVD rental and sales site with all novel features. It offered unlimited rentals without due dates or monthly rental limitations with a personalized movie recommendation system.
Netflix 1997-2000
2001-2005
Announcing its initial public offering (IPO) under the NASDAQ ticker NFLX, Netflix reached over 1 million subscribers in the United States by introducing a profile feature in their influential website design along with a free trial allowing members to create lists and rate their favorite movies. The user experience was quite engaging with the categorization of content, recommendations based on history, search engine, and a queue of movies to watch.
Netflix 2001-2005 -2003
2006-2010
They then unleashed streaming and partnering with electronic brands such as blu-ray, Xbox, and set-top boxes so that users can watch series and films straight away. Later in 2010, they also launched their sophisticated website on mobile devices with its iconic red and black themed background.
Netflix 2006-2010 -2007
2011-2015
In 2013, an eye-tracking test revealed that the users didn’t focus on the details of the movie or show in the existing interface and were perplexed with the flow of information. Hence, the professional web designers simply shifted the text from the right side to the top of the screen. With Daredevil, an audio description feature was also launched for the visually impaired ones.
Netflix 2011-2015
2016-2020
These years, Netflix came with a plethora of new features for their modern website design such as AutoPay, snippets of trailers, recommendations categorized by genre, percentage based on user experience, upcoming shows, top 10 lists, etc. These web application features yielded better results in visual hierarchy and flow of information across the website.
Netflix 2016-2020
2021
With a sleek logo in their iconic red N, timeless black background with a ‘Watch anywhere, Cancel anytime’ the color, the combination, the statement, and the leading ott platform for top video streaming service Netflix has overgrown into a revolutionary lifestyle of Netflix and Chill.
Netflix 2021
Contunue to read: Evolution in Web Design: A Case Study of 25 Years
#web #web-design #web-design-development #web-design-case-study #web-design-history #web-development
1612352512
The NineHertz is one of the renowned Node js development companies which assures to deliver jaw-dropping solutions for your venture. Our experienced Node js developers work on the advanced and next-gen web applications with trending Node js technology.
Hurry to own a highly authorized and cost-effective real-time web application. Our Node js developers maintain the coding standards which makes the outstanding outcome. Use advanced Node js for web development in no time.
#node.js development companies #node js web development company #node js web development services #node development company
1622719015
Front-end web development has been overwhelmed by JavaScript highlights for quite a long time. Google, Facebook, Wikipedia, and most of all online pages use JS for customer side activities. As of late, it additionally made a shift to cross-platform mobile development as a main technology in React Native, Nativescript, Apache Cordova, and other crossover devices.
Throughout the most recent couple of years, Node.js moved to backend development as well. Designers need to utilize a similar tech stack for the whole web project without learning another language for server-side development. Node.js is a device that adjusts JS usefulness and syntax to the backend.
Node.js isn’t a language, or library, or system. It’s a runtime situation: commonly JavaScript needs a program to work, however Node.js makes appropriate settings for JS to run outside of the program. It’s based on a JavaScript V8 motor that can run in Chrome, different programs, or independently.
The extent of V8 is to change JS program situated code into machine code — so JS turns into a broadly useful language and can be perceived by servers. This is one of the advantages of utilizing Node.js in web application development: it expands the usefulness of JavaScript, permitting designers to coordinate the language with APIs, different languages, and outside libraries.
Of late, organizations have been effectively changing from their backend tech stacks to Node.js. LinkedIn picked Node.js over Ruby on Rails since it took care of expanding responsibility better and decreased the quantity of servers by multiple times. PayPal and Netflix did something comparative, just they had a goal to change their design to microservices. We should investigate the motivations to pick Node.JS for web application development and when we are planning to hire node js developers.
The principal thing that makes Node.js a go-to environment for web development is its JavaScript legacy. It’s the most well known language right now with a great many free devices and a functioning local area. Node.js, because of its association with JS, immediately rose in ubiquity — presently it has in excess of 368 million downloads and a great many free tools in the bundle module.
Alongside prevalence, Node.js additionally acquired the fundamental JS benefits:
In addition, it’s a piece of a well known MEAN tech stack (the blend of MongoDB, Express.js, Angular, and Node.js — four tools that handle all vital parts of web application development).
This is perhaps the most clear advantage of Node.js web application development. JavaScript is an unquestionable requirement for web development. Regardless of whether you construct a multi-page or single-page application, you need to know JS well. On the off chance that you are now OK with JavaScript, learning Node.js won’t be an issue. Grammar, fundamental usefulness, primary standards — every one of these things are comparable.
In the event that you have JS designers in your group, it will be simpler for them to learn JS-based Node than a totally new dialect. What’s more, the front-end and back-end codebase will be basically the same, simple to peruse, and keep up — in light of the fact that they are both JS-based.
There’s another motivation behind why Node.js got famous so rapidly. The environment suits well the idea of microservice development (spilling stone monument usefulness into handfuls or many more modest administrations).
Microservices need to speak with one another rapidly — and Node.js is probably the quickest device in information handling. Among the fundamental Node.js benefits for programming development are its non-obstructing algorithms.
Node.js measures a few demands all at once without trusting that the first will be concluded. Many microservices can send messages to one another, and they will be gotten and addressed all the while.
Node.js was worked in view of adaptability — its name really says it. The environment permits numerous hubs to run all the while and speak with one another. Here’s the reason Node.js adaptability is better than other web backend development arrangements.
Node.js has a module that is liable for load adjusting for each running CPU center. This is one of numerous Node.js module benefits: you can run various hubs all at once, and the environment will naturally adjust the responsibility.
Node.js permits even apportioning: you can part your application into various situations. You show various forms of the application to different clients, in light of their age, interests, area, language, and so on. This builds personalization and diminishes responsibility. Hub accomplishes this with kid measures — tasks that rapidly speak with one another and share a similar root.
What’s more, Node’s non-hindering solicitation handling framework adds to fast, letting applications measure a great many solicitations.
Numerous designers consider nonconcurrent to be one of the two impediments and benefits of Node.js web application development. In Node, at whatever point the capacity is executed, the code consequently sends a callback. As the quantity of capacities develops, so does the number of callbacks — and you end up in a circumstance known as the callback damnation.
In any case, Node.js offers an exit plan. You can utilize systems that will plan capacities and sort through callbacks. Systems will associate comparable capacities consequently — so you can track down an essential component via search or in an envelope. At that point, there’s no compelling reason to look through callbacks.
So, these are some of the top benefits of Nodejs in web application development. This is how Nodejs is contributing a lot to the field of web application development.
I hope now you are totally aware of the whole process of how Nodejs is really important for your web project. If you are looking to hire a node js development company in India then I would suggest that you take a little consultancy too whenever you call.
Good Luck!
#node.js development company in india #node js development company #hire node js developers #hire node.js developers in india #node.js development services #node.js development
1611828639
The NineHertz promises to develop a pro-active and easy solution for your enterprise. It has reached the heights in Node js web development and is considered as one of the top-notch Node js development company across the globe.
The NineHertz aims to design a best Node js development solution to improve their branding status and business profit.
Looking to hire the leading Node js development company?
#node js development company #nodejs development company #node.js development company #node.js development companies #node js web development #node development company