Getting Twitter data

Let’s use the Tweepy package in python instead of handling the Twitter API directly. The two things we will do with the package are, authorize ourselves to use the API and then use the cursor to access the twitter search APIs.

Let’s go ahead and get our imports loaded.

import tweepy
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set()
%matplotlib inline

Twitter authorization

To use the Twitter API, you must first register to get an API key. To get Tweepy just install it via pip install Tweepy. The Tweepy documentation is best at explaining how to authenticate, but I’ll go over the basic steps.

Once you register your app you will receive API keys, next use Tweepy to get an OAuthHandler. I have the keys stored in a separate config dict.

config = {"twitterConsumerKey":"XXXX", "twitterConsumerSecretKey" :"XXXX"} 
auth = tweepy.OAuthHandler(config["twitterConsumerKey"], config["twitterConsumerSecretKey"]) 
redirect_url = auth.get_authorization_url() 
redirect_url

Now that we’ve given Tweepy our keys to generate an OAuthHandler, we will now use the handler to get a redirect URL. Go to the URL from the output in a browser where you can allow your app to authorize your account so you can get access to the API.

Once you’ve authorized your account with the app, you’ll be given a PIN. Use that number in Tweepy to let it know that you’ve authorized it with the API.

pin = "XXXX"
auth.get_access_token(pin)

Searching for tweets

After getting the authorization, we can use it to search for all the tweets containing the term “British Airways”; we have restricted the maximum results to 1000.

query = 'British Airways'
max_tweets = 10
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query,tweet_mode='extended').items(max_tweets)]

search_dict = {"text": [], "author": [], "created_date": []}
for item in searched_tweets:
    if not item.retweet or "RT" not in item.full_text:
        search_dict["text"].append(item.full_text)
        search_dict["author"].append(item.author.name)
        search_dict["created_date"].append(item.created_at)
df = pd.DataFrame.from_dict(search_dict)
df.head()
# Out:
    text                                                author      created_date
0   @RwandAnFlyer @KenyanAviation @KenyaAirways @U...   Bkoskey     2019-03-06 10:06:14
1   @PaulCol56316861 Hi Paul, I'm sorry we can't c...   British Airways     2019-03-06 10:06:09
2   @AmericanAir @British_Airways do you agree wit...   Hat     2019-03-06 10:05:38
3   @Hi_Im_AlexJ Hi Alex, I'm glad you've managed ...   British Airways     2019-03-06 10:02:58
4   @ZRHworker @British_Airways @Schmidy_87 @zrh_a...   Stefan Paetow   2019-03-06 10:02:33

#sentiment-analysis #naturallanguageprocessing #data-science #machine-learning #twitter #data analysis

Twitter authorization

Searching for tweets

medium.com

Using Twitter Rest APIs in Python to Search