Sofia Gardiner

Sofia Gardiner

1596099764

Tweets Classification and Clustering in Python

Introduction.

Twitter is a social networking and micro blogging service on which users post and interact with each other through messages known as “tweets”. It’s ranked as the 6th most popular social networking site and app by Dream Grow as of April, 2020 with an average of 330 million active monthly users.

Unlike other platforms like Facebook whose main role is to play ‘catch-up’ with friends, itis where people let loose and engage with different personalities from all walks of life on all sorts of matters. This atmosphere is what makes it the ideal platform for marketers, politicians and other titles whose success depend on a deep understanding of people’s views.

Through sentiment analysis, interested parties can understand what users are talking about and from the insights, make the appropriate decisions. This post focuses on classifying tweets into 4 major categories:_ Economic, Social, Cultural and Health _then performing KMeans cluster analysis on the groups.

Data set

The data used is scraped from twitter using Tweepy, a python library for accessing the Twitter API. It has 197802 tweets from different users from Kenya. Code to scrap the data is available in this repository.

The data set is called tweets_bowl.

A random sample:

Sample data containing usernames and tweets.

#python #twitter #data-science #kmeans #developer

Tweets Classification and Clustering in Python