Analysis of Tweets about the Joker in Python

In this post, we will analyze Twitter tweets related to the Joker (2019 film). To get started you need to apply for a Twitter developer account:

This is image title

After your developer account has been approved you need to create a Twitter application:

This is image title

The steps for applying for a Twitter developer account and creating a Twitter application are outlined here.

We will be using the free python library tweepy in order to access the Twitter API. Documentation for tweepy can be found here.

INSTALLATION

First, make sure you have tweepy installed. Open up a command line and type:

pip install tweepy

2. IMPORT LIBRARIES

Next, open up your favorite editor and import the tweepy and pandas libraries:

import tweepy
import pandas as pd

2. AUTHENTICATION

Next we need our consumer key and access token:

This is image title

Notice that the site suggests that you keep your key and token private! Here we define a fake key and token but you should use your real key and token upon creating the Twitter application as shown above:

consumer_key = '5GBi0dCerYpy2jJtkkU3UwqYtgJpRd' 
consumer_secret = 'Q88B4BDDAX0dCerYy2jJtkkU3UpwqY'
access_token = 'X0dCerYpwi0dCerYpwy2jJtkkU3U'
access_token_secret = 'kly2pwi0dCerYpjJtdCerYkkU3Um'

The next step is creating an OAuthHandler instance. We pass our consumer key and access token which we defined above:

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

Next, we pass the OAuthHandler instance into the API method:

api = tweepy.API(auth)

2. TWITTER API REQUESTS

Next, we initialize lists for fields we are interested in analyzing. For now, we can look at the tweet strings, users, and the time of the tweet. Next, we write a for loop over a tweepy ‘Cursor’ object. Within the ‘Cursor’ object we pass the ‘api.search’ method, set the query string (q= “Joker” ) for what we would like to search for, and set ‘count’ = 1000 so that we don’t exceed the twitter rate limit. We also use the ‘item()’ method to convert the ‘Cursor’ object into an iterable.

In order to simplify the query we can remove retweets and only include tweets in English. To get a sense of what this request returns we can print the values being appended to each list as well:

twitter_users = []
tweet_time = []
tweet_string = []
for tweet in tweepy.Cursor(api.search,q="Joker", count=1000).items(1000):
        if (not tweet.retweeted) and ('RT @' not in tweet.text):
            if tweet.lang == "en":
                twitter_users.append(tweet.user.name)
                tweet_time.append(tweet.created_at)
                tweet_string.append(tweet.text)
                print([tweet.user.name,tweet.created_at,tweet.text])

This is image title

We can also play around with the query string. Let’s change it from “Joker” to “Joaquin”, the first name of the lead actor in Joker:

for tweet in tweepy.Cursor(api.search,q="Joaquin", count=1000).items(1000):
        if (not tweet.retweeted) and ('RT @' not in tweet.text):
            if tweet.lang == "en":
                twitter_users.append(tweet.user.name)
                tweet_time.append(tweet.created_at)
                tweet_string.append(tweet.text)
                print([tweet.user.name,tweet.created_at,tweet.text])

This is image title

The next thing we can do is store the query results in a dataframe. To do this let’s define a function that takes a key word as an argument and returns a dataframe with 1000 tweets related to the keyword:

def get_related_tweets(key_word):
    twitter_users = []
    tweet_time = []
    tweet_string = [] 
    for tweet in tweepy.Cursor(api.search,q=key_word, count=1000).items(1000):
            if (not tweet.retweeted) and ('RT @' not in tweet.text):
                if tweet.lang == "en":
                    twitter_users.append(tweet.user.name)
                    tweet_time.append(tweet.created_at)
                    tweet_string.append(tweet.text)
                    #print([tweet.user.name,tweet.created_at,tweet.text])
    df = pd.DataFrame({'name':twitter_users, 'time': tweet_time, 'tweet': tweet_string})
    df.to_csv(f"{key_word}.csv")
    return df

When we call the function with “Joker”, define a dataframe as the function’s return value and print its first five rows we get :

df_joker = get_related_tweets("Joker")
print(df_joker.head(5))

This is image title

And if we do the same for “Joaquin”:

df_joaquin = get_related_tweets("Joaquin")
print(df_joaquin.head(5))

This is image title

We can also search for tweets with “Joker” and “bad movie” (let’s comment out the print to ‘.csv’ line):

def get_related_tweets(key_word):
    twitter_users = []
    tweet_time = []
    tweet_string = [] 
    for tweet in tweepy.Cursor(api.search,q=key_word, count=1000).items(1000):
            if (not tweet.retweeted) and ('RT @' not in tweet.text):
                if tweet.lang == "en":
                    twitter_users.append(tweet.user.name)
                    tweet_time.append(tweet.created_at)
                    tweet_string.append(tweet.text)
                    #print([tweet.user.name,tweet.created_at,tweet.text])
    df = pd.DataFrame({'name':twitter_users, 'time': tweet_time, 'tweet': tweet_string})
    return df
df_bad = get_related_tweets("Joker bad movie")
print(df_bad.head(5))

This is image title

Let’s take a closer look at a few lines by looping over the dataframe index and selecting values from the tweet column:

This is image title

And for tweets with “Joker” and “good movie”:

df_good = get_related_tweets("Joker good movie")
print(df_good.head(5))

This is image title

In the next post, we will use a python library called TextBlob to perform sentiment analysis on some of these tweets. From there we can build a sentiment analyzer that classifies a tweet as having negative or positive sentiment. The code from this post is available on Github. Thank you for reading! Good luck and happy machine learning!

#python #programming