Are Tweet sentiments reflective of the results

Using VADER and BERT, I analyse the sentiments of Tweets pertaining to Singapore’s ruling party in the run-up to the 2020 General Elections.

A little over a month ago on 10 July, Singapore held its elections to elect members of the 14th Parliament of Singapore. What do you do when you’re really excited as a first-time voter who has a lot of spare time on her hands? You conduct a quick study to analyse the sentiments of Tweets and see if they reflect the actual results from the election. Okay, I guess I might be the only one who thinks in this manner (nerd alert), but anyway, let’s just dive straight into it!

Downloading Tweets

Using the Tweepy API, and with the help of the code used by Griffin in this article, I downloaded tweets using ‘PAP #GE2020’ as the search term.

PAP stands for the People’s Action Party, which is Singapore’s ruling party. The hashtag GE2020 is used by most people who tweeted about the 2020 Singapore general elections.

I deliberated quite a bit over what the appropriate search term was — simply using #GE2020 wouldn’t be quite right, as the tweets collected would also include those reflecting public sentiments towards opposition parties. Although the search term that I used would exclude tweets that did not mention PAP or use the hashtag GE2020 but were, in fact, talking about the ruling party, I felt that it was the closest that I could get to isolating the tweets reflecting the sentiments towards the ruling party.

I chose to include retweets as well, as I figured that Twitter users tend to retweet tweets that they resonated with. My dataset included tweets and retweets posted in between 6 July to 8 July, where the online political discourse was likely to be the most active since polling day (10 July) was coming up. By the way, you might be wondering why 9 July was excluded, it’s because that day is cooling-off day, where there is a prohibition of campaigning activities so as to allow voters to take a step back and reflect on issues before heading to the polls the following day.

I originally hoped to collect 50000 tweets and retweets but ended up getting a lot of duplicated data, probably because there aren’t that many tweets that fulfilled the criteria of my search term over a short span of 3 days (I also forgot how Singapore’s citizen population of around 3.5 million isn’t that large to begin with). My final dataset consisted of 2504 tweets and retweets, and 406 unique tweets.

Now, you may be curious about what are some of the commonly used words in these tweets. Let’s create word clouds to find out!

