Highly configurable Twitter data collection framework

A framework for Twitter data collection.

core contains the code for data collection
ui contains the code for the UI, it can be automatically deployed to gh-pages. you can also use the original UI client if you deployed your own core collection by changing the access URL to your own.

For a full description, please check the preliminary version of Twitter Watch’s paper.

Architecture

The current version merges the core and api containers but you can use the version in the original architecture by using the docker-compose-2.yml file and commenting the second line in the launcher.sh file (nohup python api/main.py > logs_flask.txt &).

UI

Implementation

The overall implementation is a sequential execution of groups of tasks until the scheduled (and parallel) tasks are reached.

Practical results

This tool was used to collect data on the Portuguese Twittersphere, the following figure summarizes the amount of accounts and tweets it collected.

Tips and Tricks

docker compose deploy

cp example.env .env and edit
docker-compose up (pass -d for detached mode)

mongo dumps

docker exec some-mongo sh -c 'exec mongodump -d twitter --archive' > PATHTOLOCALFILE/dump.archive

docker exec -it twitter-watch_core_1 bash

Pre-commit

Check pre-commit.com for more pre-commit functionality and then add it to the pre-commit config file.

To run, execute pre-commit run --all-files.

Download Details:

Author: msramalho

Demo: https://msramalho.github.io/twitter-watch/

Source Code: https://github.com/msramalho/twitter-watch

#vuejs #javascript #vue