Getting Twitter data
Let’s use the Tweepy package in python instead of handling the Twitter API directly. The two things we will do with the package are, authorize ourselves to use the API and then use the cursor to access the twitter search APIs.
Let’s go ahead and get our imports loaded.
import tweepy
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set()
%matplotlib inline
To use the Twitter API, you must first register to get an API key. To get Tweepy just install it via pip install Tweepy. The Tweepy documentation is best at explaining how to authenticate, but I’ll go over the basic steps.
Once you register your app you will receive API keys, next use Tweepy to get an OAuthHandler. I have the keys stored in a separate config dict.
config = {"twitterConsumerKey":"XXXX", "twitterConsumerSecretKey" :"XXXX"}
auth = tweepy.OAuthHandler(config["twitterConsumerKey"], config["twitterConsumerSecretKey"])
redirect_url = auth.get_authorization_url()
redirect_url
Now that we’ve given Tweepy our keys to generate an OAuthHandler, we will now use the handler to get a redirect URL. Go to the URL from the output in a browser where you can allow your app to authorize your account so you can get access to the API.
Once you’ve authorized your account with the app, you’ll be given a PIN. Use that number in Tweepy to let it know that you’ve authorized it with the API.
pin = "XXXX"
auth.get_access_token(pin)
After getting the authorization, we can use it to search for all the tweets containing the term “British Airways”; we have restricted the maximum results to 1000.
query = 'British Airways'
max_tweets = 10
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query,tweet_mode='extended').items(max_tweets)]
search_dict = {"text": [], "author": [], "created_date": []}
for item in searched_tweets:
if not item.retweet or "RT" not in item.full_text:
search_dict["text"].append(item.full_text)
search_dict["author"].append(item.author.name)
search_dict["created_date"].append(item.created_at)
df = pd.DataFrame.from_dict(search_dict)
df.head()
# Out:
text author created_date
0 @RwandAnFlyer @KenyanAviation @KenyaAirways @U... Bkoskey 2019-03-06 10:06:14
1 @PaulCol56316861 Hi Paul, I'm sorry we can't c... British Airways 2019-03-06 10:06:09
2 @AmericanAir @British_Airways do you agree wit... Hat 2019-03-06 10:05:38
3 @Hi_Im_AlexJ Hi Alex, I'm glad you've managed ... British Airways 2019-03-06 10:02:58
4 @ZRHworker @British_Airways @Schmidy_87 @zrh_a... Stefan Paetow 2019-03-06 10:02:33
#sentiment-analysis #naturallanguageprocessing #data-science #machine-learning #twitter #data analysis