Outline:

  1. Introduction
  2. Setup/Installation
  3. Pre-script
  4. Script
  5. into Production
  6. Use Cases
  7. Example
  8. Conclusion
  9. Connect

Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter’s API.

In this article, I’ll describe how I created a huge dataset of tweets scraped from an entire country.

Note: This article will prepare you for a production level script.

Installation

Git:

git clone https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt

Pip:

pip3 install twint

or

pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Pre-Script:

  1. Search for the cities list of the country (this could be the name of places in a city or state) on google. Like: Cities list of Pakistan
  2. After downloading the cities list, clean the data. And if it is not cleaned, you can also obtain this data from Wikipedia.
  3. After cleaning it, assign the list in the ‘all_cities’ variable script as below

#twitter #twint #machine-learning #scraping #dataset

How to Scrape Tweets and create Dataset using Twint without Twitter API
6.80 GEEK