A Vue.js and Flask Web Application designed to provide a quick way to search for quotes from NBC’s “The Office”.
Credit to officequotes.net for providing all quote data.
Credit to imdb.com for episode descriptions.
Quotes are scraped directly from the website as of this moment.
This repository will hold the current pre-processed raw quote data, but the application has the ability to fetch and parse HTML pages directly as needed.
python server/cli.py fetch
--season SEASON Fetches all episodes from a specific season.
--episode EPISODE Fetches a specific episode. Requires SEASON to be specified.
--all Fetches data for every episode from every season.
--skip SEASON:EPISODE When specified, it will skip a given episode.
The data has to be parsed, but due to high irregularity (at least too much for me to handle), the files will have to be inspected and manually processed.
-s --season SEASON Pre-processes all episodes from a specific season.
-e --episode EPISODE Pre-processes a specific episode. Requires SEASON to be specified.
-a --all Pre-processes all episodes from every season.
-o --overwrite DANGER: Will overwrite files. May result in manually processed files to be lost forever.
From then on, once all files have been pre-processed, you will have to begin the long, annoying process of editing them into my custom format.
These raw pre-processed files are located in './server/data/processed/
Each section (barring the first) is pre-pended by a .hyphen.
CharacterName: Text that character says.
OtherCharacter: More text that other character says..
-
ThirdCharacter: Text that character says in a second scene/section.
-!1
Fourth Character With Spaces In Name: Text that fourth character says in a deleted scene.
Fifth-Character: Which deleted scene? Deleted scene number one.
Deleted scenes are marked by a initial exclamation mark, and then a number of digits marking which deleted scene they are a part of.
Please note that extra text like ‘Deleted Scenes 3’ might appear before a hyphen - this is expected and is helpful when deciding which scene goes with which Deleted Scene ID. If you don’t know, do what I did - go look at the web page it’s based on. Otherwise, I read the quotes and figure out based on context.
This concept is rather loose, slow, and dumb, it simply allows me to mark what deleted scenes go together while working with a incredibly inconsistent, human curated data format.
To ease text processing, I did come up with RegEx expressions for search and replacement:
^([\w\s]+\-*[\w\s]*):\s+
$1|
From then on, the process becomes much simpler, 95% of the work needed to process quotes is already done.
Now that quotes are in a consistent (although custom) format, they need to be processed into individual episodes. In reality, they are just the JSON format of the previous stage.
python server/cli.py process
-s --season SEASON Processes all episodes from a specific season.
-e --epsiode EPISODE Processes a specific episode. Requires SEASON to be specified.
-a --all Processes all episodes from all seasons.
Now that they’re all in individual files, the final commands can be ran to compile them into one file, a static ‘database’ or something. Technically, they could be kept scattered, but I decided to make it simpler with just 1 big file.
This also is where Algolia comes in.
python server/cli.py build [algolia|final]
Each command is ran with no special arguments (as of now), generating a algolia.json
or data.json
in the ./server/data/
folder.
This data.json
file is loaded by the Flask server and the algolia.json
can be uploaded to your primary index.
For every command mentioned, you can read all arguments with --help
:
$ python cli.py preprocess --help
Usage: cli.py preprocess [OPTIONS]
Pre-processes raw HTML files into mangled custom quote data.
Custom quote data requires manual inspection and formatting, making it a
dangerous operation that may overwrite precious quote data.
Options:
-s, --season INTEGER Season to be fetched. Without --episode, will
download all episodes in a season.
-e, --episode INTEGER Specific episode to be fetched. Requires
--season to be specified.
--all Fetch all episodes, regardless of previous
specifications.
-o, --overwrite Overwrite if a file already exists.
-ss, --silent-skip Skip missing/existing files silently
-ssm, --silent-skip-missing Skip missing files silently
-sse, --silent-skip-existing Skip overwrite skips silently
--help Show this message and exit.
This project was built on Python 3.7 and Node v12.18.3 / npm 6.14.6.
To install all Node/NPM dependencies, run
npm install
To install Python’s dependencies, run
pip install -r ./requirements.txt
I recommend that you use a virtualenv in order to keep dependencies separate from other projects, as I do. Personally, I use PyCharm Professional to maintain virtualenvs, just because it’s easy to start, use, update and maintain them.
npm run serve
.
./client/
.flask run
.
./server/
.--host=0.0.0.0
to the end to allow connections from LAN.Note: Readying this application for Production and wider-development is still in progress.
Don’t try to run this application just yet.
Small to-do list to complete.
Author: Xevion
Source Code: https://github.com/Xevion/the-office
#vuejs #vue #javascript