An In-Progress Web Scraping Project Built with Python, R, and SQL

New to Streaming Scraper

An in-progress web scraping project built with Python, R, and SQL.

The scraped data are movie and TV show information. The goal of the project is to show new to streaming titles that arrive on Netflix monthly with additional details, such as critic and audience ratings.

Current stage: Preparing how to present data with R Markdown.

Testing at: https://charlesdungy.github.io/new-to-streaming-scraper/

Future stage: Complete documentation, comments.

Description

Data are retrieved from two different data sources: What’s on Netflix (WON) and Rotten Tomatoes (RT). RT data are cleaned and transformed with Python, while WON data are cleaned and transformed with R.

All data are piped into a MySQL database, then retrieved for presentation in R.

Here is a high-level look at the pipeline:

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

MySQL

Current Directory Tree

Download Details:

Author: charlesdungy
Download Link: Download The Source Code
Official Website: https://github.com/charlesdungy/new-to-streaming-scraper
License: MIT

#python #sql #mysql #r-programming