Installation

To add Nokogiri to your application you just need to run the command

gem install nokogiri

After installation make sure you require nokogiri and open-uri.

require 'nokogiri' 
require 'open-uri'

Scraping from Website

To scrap from a website you need the url from the page you want to scrape from. Then pass the the url to the URI.open method to get the HTML. After that pass the HTML to the Nokogiri::HTML method to get a set of nodes that you can parse through using Nokogiri.

url = 'https://www.101cookbooks.com/ingredient.html'
html = URI.open(url)
doc = Nokogiri::HTML(html)

Process Data

Scraping data from the website is a bit complicated. You need to figure out where the data you want to read is from the DOM. One way to do this is to inspect the element and hover over the element in the HTML. The pop up will show the CSS of that element which you can use.

Image for post

content = doc.css("div.maincontent.fullarchives.ingredients.col-lg-8.col-xl-8")

In this case we want to read all the ingredients in this website. All of the ingredients are grouped alphabetically but are all contained in a div container. All the ingredients are going to be in the first node so we use the first index and parse through its children.

Image for post

If we inspect each group of ingredients they are contained in a div with the class “archives” and “flex-wrap” which we can check. After that we need to look at its children.

#nokogiri #ruby #ruby-on-rails #web-scraping #backend-development

Ruby Web Scraping Using Nokogiri
2.15 GEEK