As my previous blog, I use the python web Crawler library to help crawl the static website. For the Scrapy, there can be customize download middle ware, which can deal with static content in the website like JavaScript.

However, the Scrapy already helps us with much of the underlying implementation, for example, it uses it own dispatcher and it has pipeline for dealing the parsing word after download. One drawback for using such library is hard to deal with some strange bugs occurring because they run the paralleled jobs.

For this tutorial, I want to show the structure of a simple and efficient web crawler.

https://www.loginradius.com/engineering/blog/write-a-highly-efficient-python-web-crawler/

#pyhton #coding #programming #engineering

Write a highly efficient python Web Crawler
1.15 GEEK