We’ll be expanding on our scheduled web scraper by integrating it into a Django web app.
Part 1, Building an RSS feed scraper with Python, illustrated how we can use Requests and Beautiful Soup.
In part 2 of this series, Automated web scraping with Python and Celery, I demonstrated how to schedule web scraping tasks with Celery, a task queue.
Previously, I created a simple RSS feed reader that scrapes information from HackerNews using Requests and BeautifulSoup (it’s available on my GitHub). After creating the basic scraping script, I illustrated a way to integrate Celery into the application to act as a task management system. Using Celery, I was able to schedule scraping tasks to occur at various intervals — this allowed me to run the script without having to be present.
Our next step is to bundle the scheduled scraping tasks into a web application using Django. This will give us access to a database, the ability to display our data on a website, and act as a step toward creating a “scraping” app. The goal of this project is to create something scalable, similar to an aggregator.
This article **will not **serve as a top-to-bottom Django guide. Instead, it will be geared toward a “Hello World” approach, followed by displaying scraped content on our web app.
I will be using the following:
**Note: **All library dependencies are listed in the Pipfile
/Pipfile.lock
.
#web-scraping #python #django #web-development #data