It is often very difficult for AI researchers to gather social media data for machine learning. Luckily, one free and accessible source of SNS data is Twitter.
Numerous educational organizations, research teams, and independent researchers have scraped tweets from Twitter and made the data available for public use.
From sentiment analysis models to content moderation models and other NLP use cases, Twitter data can be used to train various machine learning algorithms.
Below is a list of some of the best open Twitter datasets for machine learning.
A dataset containing tweets about the large tech company, Apple. The tweets in this dataset were compiled using tweets containing the hashtag #AAPL, the reference @apple, and others. The tweets were then divided into positive, negative, or neutral sentiments.
This dataset for machine learning consists of 10,000 tweets which include the hashtag #AvengersEndgame.
This dataset contains 150,000 tweets mentioning Charlottesville or containing the #Charlottesville hashtag.
4. Credibility Corpus in French and English
The Credibility Corpus in French and English was created to analyze information credibility and detect misinformation and rumors. The dataset is comprised of both French and English tweets about rumors.
5. Customer Support on Twitter
This dataset is a large corpus of tweets and replies to and from customer service support lines on Twitter.
The Every Donald Trump Tweet dataset is a compilation of every tweet the president has ever posted. The data was later moved to the TrumpTwitterArchive, but can still be accessed.
From FollowtheHashtag, this dataset is a collection of 200,000 geolocated tweets from Tokyo.
Also from FollowtheHashtag, this dataset is a collection of 200,000 geolocated tweets from the United States of America.
#twitter #artificial-intelligence #data-science #datasets #data #machine-learning #ml #twitter-data