We’ll be expanding on our scheduled web scraper by integrating it into a Django web app.

Part 1Building an RSS feed scraper with Python, illustrated how we can use Requests and Beautiful Soup.

In part 2 of this series, Automated web scraping with Python and Celery, I demonstrated how to schedule web scraping tasks with Celery, a task queue.

Background:

Previously, I created a simple RSS feed reader that scrapes information from HackerNews using Requests and BeautifulSoup (it’s available on my GitHub). After creating the basic scraping script, I illustrated a way to integrate Celery into the application to act as a task management system. Using Celery, I was able to schedule scraping tasks to occur at various intervals — this allowed me to run the script without having to be present.

Our next step is to bundle the scheduled scraping tasks into a web application using Django. This will give us access to a database, the ability to display our data on a website, and act as a step toward creating a “scraping” app. The goal of this project is to create something scalable, similar to an aggregator.

This article **will not **serve as a top-to-bottom Django guide. Instead, it will be geared toward a “Hello World” approach, followed by displaying scraped content on our web app.

I will be using the following:

  • Python 3.7+
  • Requests — For web requests
  • BeautifulSoup 4 — HTML parsing tool
  • A text editor (I use Visual Studio Code
  • Celery — Distributed task queue
  • RabbitMQ — Message broker
  • lxml — If you’re using a virtual environment
  • Django — A Python web framework
  • Pipenv — A virtual environment package

**Note: **All library dependencies are listed in the Pipfile/Pipfile.lock.

#web-scraping #python #django #web-development #data

Making a Web Scraping Application with Python, Celery and Django
22.10 GEEK