Open data sources are one of the best gifts for data scientists or analysts as they allow them to draw valuable insights for free, without having to worry about the data licenses. Twitter is one of the most popular social media application in the world as it’s free, and also allow users to tweet on any topics that come to their mind. This article will focus on how can we use Twitter through R programming to extract valuable insights and communicate these findings to the relevant stakeholders using Tableau.

Problem Statement

“How might we help the communication practitioners to get actionable insights from Twitter so that they can create more effective communication that caters to the needs & concerns of general public”

The chosen target user of this problem statement is the military communication practitioner, who is keen to understand what are the concerns the general public is having with regards to the military (i.e. National Service) in Singapore.

Characteristic of Twitter data source

For every data source, we can do a simple representation with the four ‘V’s of big data, mainly Volume, Velocity, Variety, and Veracity for a general understanding of this data source.

  1. **Volume **— Scale of data
  2. **Velocity **— Analysis of streaming data
  3. **Variety **— Different forms of data
  4. **Veracity **— Uncertainty of data

Connection to Twitter API

We can use twitteR library package in R Programming to access the Twitter API. Note that we will need to sign up for a Twitter developer account to access the API, as each user will be provided with a unique set of consumer key, consumer secret key, access token, and access secret key. Once we have set up the connection to Twitter API, we will request the tweets we want from Twitter by stating the search term (i.e. “National Service”), the maximum number of tweets (i.e. n = 1000), latitude longitude of Singapore with the search radius in miles (i.e. geocode = ‘1.3521, 103.8198, 279mi’) and the language (i.e. lang= ‘en’).

## import necessary library for Twitter API, data manipulation and text cleaning
library("twitteR")
library("ROAuth")
library("dplyr")
library("tidytext")
## Set up Twitter Connection
consumer_key <- '' ## removed due to confidentiality
consumer_secret<- '' ## removed due to confidentiality
access_token <- '' ## removed due to confidentiality
access_secret <- '' ## removed due to confidentiality
setup_twitter_oauth(consumer_key ,consumer_secret,access_token ,access_secret)
## extract english tweets using 'National Service' tweeted in Singapore, with retweets removed
tweets <- strip_retweets(searchTwitter('National Service',n=1000, geocode='1.3521,103.8198,279mi',lang='en'))
## print length of tweets
length(tweets)

Note that for the free version of Twitter public API, we are only able to request the last 7 days’ worth of tweets. As a result, the number of tweets we got in our request was just 17.

#twitter #open-data #data-science #text-mining #sentiment-analysis

Twitter Text Analysis in R
2.35 GEEK