Ricky Martin

Ricky Martin

1558777375

Do you want build a Website Scraper using JavaScript?

If you want to collect data from the web, you’ll come across a lot of resources teaching you how to do this using more established back-end tools like Python or PHP. But there’s a lot less guidance out there for the new kid on the block, Node.js.

Thanks to Node.js, JavaScript is a great language to use for a web scraper: not only is Node fast, but you’ll likely end up using a lot of the same methods you’re used to from querying the DOM with front-end JavaScript. Node.js has tools for querying both static and dynamic web pages, and it is well-integrated with lots of useful APIs, node modules and more.

In this article, I’ll walk through a powerful way to use JavaScript to build a web scraper. We’ll also explore one of the key concepts useful for writing robust data-fetching code: asynchronous code.

Asynchronous Code

Fetching data is often one of the first times beginners encounter asynchronous code. By default, JavaScript is synchronous, meaning that events are executed line-by-line. Whenever a function is called, the program waits until the function is returned before moving on to the next line of code.

But fetching data generally involves asynchronous code. Such code is removed from the regular stream of synchronous events, allowing the synchronous code to execute while the asynchronous code waits for something to occur: fetching data from a website, for example.

Combining these two types of execution — synchronous and asynchronous — involves some syntax which can be confusing for beginners. We’ll be using the async and await keywords, introduced in ES7. They’re syntactic sugar on top of ES6’s Promise syntax, and this — in turn — is syntactic sugar on top of the previous system of callbacks.

Passed-in Callbacks

In the days of callbacks, we were reliant on placing every asynchronous function within another function, leading to what’s sometimes known as the ‘pyramid of doom’ or ‘callback hell’. The example below is on the simple side!

/* Passed-in Callbacks */
doSomething(function(result) {
doSomethingElse(result, function(newResult) {
doThirdThing(newResult, function(finalResult) {
console.log(finalResult);
}, failureCallback);
}, failureCallback);
}, failureCallback);

Promise, Then and Catch

In ES6, a new syntax was introduced, making it much simpler and easier-to-debug asynchronous code. It is characterised by the Promise object and the then and catch methods:

/* "Vanilla" Promise Syntax */
doSomething()
.then(result => doSomethingElse(result))
.then(newResult => doThirdThing(newResult))
.then(finalResult => {
console.log(finalResult);
})
.catch(failureCallback);

Async and Await

Finally, ES7 brought async and await , two keywords which allow asynchronous code to look much closer to synchronous JavaScript, as in the example below. This most recent development is generally considered the most readable way to do asynchronous tasks in Javascript — and may even boost memory efficiency in comparison to regular Promise syntax.

/* Async/Await Syntax */
(async () => {
  try {
    const result = await doSomething();
    const newResult = await doSomethingElse(result);
    const finalResult = await doThirdThing(newResult);
    console.log(finalResult); 
  } catch(err) {
    console.log(err);
  }
})();

Static Websites

In the past, retrieving data from another domain involved the XMLHttpRequestor XHR object. Nowadays, we can use JavaScript’s Fetch API. The fetch()method. It takes one mandatory argument — the path to the resource you want to fetch (usually a URL) — and returns a Promise .

To use fetch in Node.js, you’ll want to import an implementation of fetch. Isomorphic Fetch is a popular choice. Install it by typing npm install isomorphic-fetch es6-promise into the terminal, and then require it at the top of your document like so: const fetch = require('isomorphic-fetch') .

JSON

If you’re fetching JSON data, then you should use the json() method on your response before processing it:

(async () => {
  const response = await fetch('https://wordpress.org/wp-json');
  const json = await response.json();
  console.log(JSON.stringify(json));
})()

JSON makes it relatively straightforward to grab the data you want from the and process it. But what if JSON data isn’t available?

HTML

For most websites, you’ll need to extract the data you want from the HTML. With regards to static websites, there are two main ways to go about this.

Option A: Regular Expressions

If your needs are simple or you’re comfortable writing regex, you can simply use the text() method, and then extract the data you need using the matchmethod. For example, here’s is some code to extract the contents of the first h1 tag on a page:

(async () => {
  const response = await fetch('https://example.com');
  const text = await response.text();
  console.log(text.match(/(?<=\<h1>).*(?=\<\/h1>)/));
})()

Option B: A DOM Parser

If you’re dealing with a more complicated document, it can be helpful to make use of JavaScript’s array of in-built methods for querying the DOM: methods like getElementById , querySelector and so on.

If we were writing front-end code, we could use the DOMParser interface. As we’re using Node.js, we can grab a node module instead. A popular option is jsdom, which you can install by typing npm i jsdom into the terminal and requiring like this:

const jsdom = require("jsdom");
const { JSDOM } = jsdom;

With jsdom, we can query our imported HTML as its own DOM object using querySelector and related methods:

(async () => {
  const response = await fetch('https://example.com');
  const text = await response.text();
  const dom = await new JSDOM(text);
  console.log(dom.window.document.querySelector("h1").textContent);
})()

Dynamic Websites

What if you want to grab data from a dynamic website, where content is generated in real-time, such as on a social media site? Performing a fetchrequest won’t work because it will return the site’s static code, and not the dynamic content that you probably want to get access to.

If that’s what you’re looking for, the best node module for the job is puppeteer — not least because the main alternative, PhantomJS, is no longer being developed.

Puppeteer allows you to run Chrome or Chromium over the DevTools Protocol, with features such as automatic page navigation and screen capture. By default, it runs as a headless browser, but changing this setting can be helpful for debugging.

Getting Started

To install, navigate to your project directory in the terminal and type npm i puppeteer . Here’s some boilerplate code to get you started:

const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({
  headless: false,
});
const page = await browser.newPage();
await page.setRequestInterception(true);
await page.goto('http://www.example.com/');

First, we launch puppeteer (disabling headless mode, so we can see what we’re doing). Then we open a new tab. The method page.setRequestInterception(true) is optional, allowing us to use abort , continue and respond methods later on. Lastly, we go to our chosen page.

As in the “DOM Parser” example above, we can now query elements using document.querySelector and the related methods.

Logging In

If we need to log in, we can do so easily using the type and click methods, which identify DOM elements using the same syntax as querySelector :

await page.type('#username', 'UsernameGoesHere');
await page.type('#password', 'PasswordGoesHere');
await page.click('button');
await page.waitForNavigation();

Handling Infinite Scroll

It is increasingly common for dynamic sites to display content via an infinite scrolling mechanism. To cope with that, you can set puppeteer to scroll down based on certain criteria.

Here’s a simple example that will scroll down 5 times, waiting for 1 second between each scroll to account for loading content.

for (let j = 0; j < 5; j++) {
  await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
  await page.waitFor(1000);
}

Because load times will differ, the above code will not necessarily load the same number of results every time. If that’s a problem, you may want to scroll until a certain number of elements is found, or some other criteria.

Making Optimisations

Lastly, there are several ways that you can make optimisations to your code, so that is runs as quickly and smoothly as possible. As an example, here’s a way to get puppeteer to avoid loading fonts or images.

await page.setRequestInterception(true);
page.on('request', (req) => {
 if (req.resourceType() == 'font' || req.resourceType() == 'image'){
   req.abort();
 }
 else {
   req.continue();
 }
});

You could also disable CSS in a similar way, although sometimes the CSS is integral to the dynamic data you want — so I’d err on the side of caution with this one!

And that’s pretty much all you need to know to make a functioning web scraper in JavaScript! Once you’ve stored the data in memory, you can then add it to a local document (using the fs module), upload it to a database, or use an API (such as the Google Sheets API ) to send the data directly to a document.

If you’re new to web scraping — or you know about web scraping but you’re new to Node.js — I hope this article has made you aware of some of the powerful tools that make Node.js a very capable scraping tool.

Thank you


#javascript #node-js #web-development

What is GEEK

Buddha Community

Do you want build a Website Scraper using JavaScript?
Nandini roy

Nandini roy

1618667097

Responsive Personal Portfolio Website Using HTML CSS and JavaScript

Tutorial: https://youtu.be/9DDiorNZiyw

#create a portfolio website with html css javascript #personal portfolio complete website using only html css javascript #responsive portfolio website html css javascript #responsive personal portfolio website #portfolio website #responsive website

Why Use WordPress? What Can You Do With WordPress?

Can you use WordPress for anything other than blogging? To your surprise, yes. WordPress is more than just a blogging tool, and it has helped thousands of websites and web applications to thrive. The use of WordPress powers around 40% of online projects, and today in our blog, we would visit some amazing uses of WordPress other than blogging.
What Is The Use Of WordPress?

WordPress is the most popular website platform in the world. It is the first choice of businesses that want to set a feature-rich and dynamic Content Management System. So, if you ask what WordPress is used for, the answer is – everything. It is a super-flexible, feature-rich and secure platform that offers everything to build unique websites and applications. Let’s start knowing them:

1. Multiple Websites Under A Single Installation
WordPress Multisite allows you to develop multiple sites from a single WordPress installation. You can download WordPress and start building websites you want to launch under a single server. Literally speaking, you can handle hundreds of sites from one single dashboard, which now needs applause.
It is a highly efficient platform that allows you to easily run several websites under the same login credentials. One of the best things about WordPress is the themes it has to offer. You can simply download them and plugin for various sites and save space on sites without losing their speed.

2. WordPress Social Network
WordPress can be used for high-end projects such as Social Media Network. If you don’t have the money and patience to hire a coder and invest months in building a feature-rich social media site, go for WordPress. It is one of the most amazing uses of WordPress. Its stunning CMS is unbeatable. And you can build sites as good as Facebook or Reddit etc. It can just make the process a lot easier.
To set up a social media network, you would have to download a WordPress Plugin called BuddyPress. It would allow you to connect a community page with ease and would provide all the necessary features of a community or social media. It has direct messaging, activity stream, user groups, extended profiles, and so much more. You just have to download and configure it.
If BuddyPress doesn’t meet all your needs, don’t give up on your dreams. You can try out WP Symposium or PeepSo. There are also several themes you can use to build a social network.

3. Create A Forum For Your Brand’s Community
Communities are very important for your business. They help you stay in constant connection with your users and consumers. And allow you to turn them into a loyal customer base. Meanwhile, there are many good technologies that can be used for building a community page – the good old WordPress is still the best.
It is the best community development technology. If you want to build your online community, you need to consider all the amazing features you get with WordPress. Plugins such as BB Press is an open-source, template-driven PHP/ MySQL forum software. It is very simple and doesn’t hamper the experience of the website.
Other tools such as wpFoRo and Asgaros Forum are equally good for creating a community blog. They are lightweight tools that are easy to manage and integrate with your WordPress site easily. However, there is only one tiny problem; you need to have some technical knowledge to build a WordPress Community blog page.

4. Shortcodes
Since we gave you a problem in the previous section, we would also give you a perfect solution for it. You might not know to code, but you have shortcodes. Shortcodes help you execute functions without having to code. It is an easy way to build an amazing website, add new features, customize plugins easily. They are short lines of code, and rather than memorizing multiple lines; you can have zero technical knowledge and start building a feature-rich website or application.
There are also plugins like Shortcoder, Shortcodes Ultimate, and the Basics available on WordPress that can be used, and you would not even have to remember the shortcodes.

5. Build Online Stores
If you still think about why to use WordPress, use it to build an online store. You can start selling your goods online and start selling. It is an affordable technology that helps you build a feature-rich eCommerce store with WordPress.
WooCommerce is an extension of WordPress and is one of the most used eCommerce solutions. WooCommerce holds a 28% share of the global market and is one of the best ways to set up an online store. It allows you to build user-friendly and professional online stores and has thousands of free and paid extensions. Moreover as an open-source platform, and you don’t have to pay for the license.
Apart from WooCommerce, there are Easy Digital Downloads, iThemes Exchange, Shopify eCommerce plugin, and so much more available.

6. Security Features
WordPress takes security very seriously. It offers tons of external solutions that help you in safeguarding your WordPress site. While there is no way to ensure 100% security, it provides regular updates with security patches and provides several plugins to help with backups, two-factor authorization, and more.
By choosing hosting providers like WP Engine, you can improve the security of the website. It helps in threat detection, manage patching and updates, and internal security audits for the customers, and so much more.

Read More

#use of wordpress #use wordpress for business website #use wordpress for website #what is use of wordpress #why use wordpress #why use wordpress to build a website

wp codevo

wp codevo

1608043308

Responsive Personal Portfolio Website using HTML CSS & JavaScript

https://youtu.be/CHQvM4oS7ww

#responsive website using html5 and css3 #responsive website html css #portfolio website #portfolio website design in html css #complete responsive website in html css javascript #one page personal website

CSS Boss

CSS Boss

1606912089

How to create a calculator using javascript - Pure JS tutorials |Web Tutorials

In this video I will tell you How to create a calculator using javascript very easily.

#how to build a simple calculator in javascript #how to create simple calculator using javascript #javascript calculator tutorial #javascript birthday calculator #calculator using javascript and html

anita maity

anita maity

1618639156

Responsive Personal Portfolio Website HTML CSS and JavaScript

Demo: https://cutt.ly/TvxhH2T

#portfolio website html css #personal portfolio website tutorial #portfolio website #responsive personal portfolio website #portfolio website html css javascript #responsive portfolio website html css javascript