Kabanda  Nat

Kabanda Nat

1626171432

The majority of "Autocrapers" are still rule-based web scraping applications

As with most forms of tech these days, web scrapers have recently seen a surge of claims that they’re somehow based on AI or machine learning tech. While this suggests that an AI will detect exactly what you want extracted from a page, most scrapers are still rule-based (there are some exceptions, such as Diffbot’s Automatic Extraction APIs). Why does this matter? Historically rule-based extraction has been the norm. In rule-based extraction, you specify a set of rules for what you want pulled from a page. This is often an HTML element, CSS selector, or a regex pattern. Maybe you want the third bulleted item beneath every paragraph in a text, or all headers, or all links on a page; rule-based extraction can help with that.

#web scraping tools #autoscrapers #rule-based

What is GEEK

Buddha Community

The majority of "Autocrapers" are still rule-based web scraping applications
Kabanda  Nat

Kabanda Nat

1626171432

The majority of "Autocrapers" are still rule-based web scraping applications

As with most forms of tech these days, web scrapers have recently seen a surge of claims that they’re somehow based on AI or machine learning tech. While this suggests that an AI will detect exactly what you want extracted from a page, most scrapers are still rule-based (there are some exceptions, such as Diffbot’s Automatic Extraction APIs). Why does this matter? Historically rule-based extraction has been the norm. In rule-based extraction, you specify a set of rules for what you want pulled from a page. This is often an HTML element, CSS selector, or a regex pattern. Maybe you want the third bulleted item beneath every paragraph in a text, or all headers, or all links on a page; rule-based extraction can help with that.

#web scraping tools #autoscrapers #rule-based