Filtering Tweets by Location

Regular Expressions + Account Location metadata == Location Filtering

In my latest project, I explored the question, “What is the public sentiment in the United States on K-12 learning during the COVID-19 pandemic?”. Using data collected from Twitter, Natural Language Processing, and Supervised Machine Learning, I created a text classifier to predict Tweets' sentiment on this topic.

Since I wanted to hone in on sentiment in the United States, I needed to filterTweets by location. The Twitter Developer site offers some good guidance here on the available options.

I choose to use the Account Location geographical metadata. Here are the details from the Twitter Developer website: “Based on the ‘home’ location provided by the user in their public profile. This is a free-form character field and may or may not contain metadata that can be geo-referenced”.

Before we begin, here are a few caveats:

  • Since Account Location is not guaranteed to be populated, you have to accept the fact that you’ll potentially miss out on relevant Tweets.
  • The approach I used depends on Account Location containing a US identifier(such as a valid US state name).

If those caveats are acceptable to you, keep reading. :)

Before I share the actual code, here’s a rundown of my methodology:

  • I created a Python script to listen in on Twitter Stream for pertinent Tweets on my subject(s) of interest.
  • Upon getting a Tweet, I get the Account Location attribute and use regular expressions to check if it’s a US Location.
  • Tweets that pass the Location check carry on through my code for further processing. (In my case, I stored the Tweet in a MongoDB collection.)

Now that I’ve shared the overall workflow let’s look at some code.

python python-regex

