Last month, The Intercept published an article claiming that, “In recent presidential cycles, the velocity of edits made to a Wikipedia page have correlated with the choice of vice presidential running mate.” The article focuses on Kamala Harris, and the increasing number of edits that took place on her Wikipedia page in June.

The article argues that the pace of edits can be interpreted as signaling the strength of her potential to be named as VP candidate. Interesting. But is it valid?

Well, now that a month has passed and a selection has still not been made, I decided to take a look at these changes for myself. I wanted to see how her rate of edits stacked up against other potential candidates. I was also curious to see if there were any other correlations we could draw with the results, to deepen our understanding of their meaning. So, to Python I turned.


There is no single definitive list of 2020 potential Democratic VP candidates, so since we’re working with Wikipedia, I’ll stay to stay true to the source by collecting a list of from the Wikipedia article, “2020 Democratic Party vice presidential candidate selection.” Here are the nominees.

Image for post

Screenshot from Wikipedia.

Getting Revision Timestamps

To achieve my goal, I’ll need to retrieve data from Wikipedia about each potential candidate. For this we can use the MediaWiki action API. Let’s get to it!

Take some names

I’ll start by preparing a list of names. We’ll use this list of names to look up timestamps of revisions for their respective Wikipedia articles:

nominees = ['Karen Bass', 'Keisha Lance Bottoms', 'Val Demings', 'Tammy Duckworth', 'Kamala Harris', 'Michelle Lujan Grisham', 'Susan Rice', 'Elizabeth Warren', 'Gretchen Whitmer']

Get some timestamps

Now I’ll deploy a function that allows us to make an API call to Wikipedia, and returns a list of revision timestamps for a given article. We’ll use the requests library for this:

# Import the requests library
import requests

# define a function that will get all of the revision timestamps for a given article
# The function takes a string of the correctly-spelled article title as its argument

def get_revision_timestamps(TITLE):
    
    # base URL for API call
    BASE_URL = "http://en.wikipedia.org/w/api.php"
    
    # empty list to hold our timestamps once retrieved.
    revision_list = []

    # first API call. This loop persists while revision_list is empty
    while not revision_list:
        # set parameters for API call
        parameters = { 'action': 'query',
                       'format': 'json',
                       'continue': '',
                       'titles': TITLE,
                       'prop': 'revisions',
                       'rvprop': 'ids|userid|timestamp',
                       'rvlimit': '500'}
        # make the call
        wp_call = requests.get(BASE_URL, params=parameters)
        # get the response
        response = wp_call.json()
        
        # now we parse the response.
        query = response['query']
        pages = query['pages']
        page_id_list = list(pages.keys())
        page_id = page_id_list[0]
        page_info = pages[str(page_id)]
        revisions = page_info['revisions']
        
        # Now that the response has been parsed and we can access the revision timestamps, add them to our revision_list.
        for entry in revisions:
            revision_list.append(entry['timestamp'])
        # revision_list is no longer empty, so this loop breaks.


## next series of passes, until you're done.
## this makes calls until the limit of 500 results per call is no longer reached. 
    else:
        while str(len(revisions)) == parameters['rvlimit']:
            start_id = revision_list[-1]
            parameters = { 'action': 'query',
                           'format': 'json',
                           'continue': '',
                           'titles': TITLE,
                           'prop': 'revisions',
                           'rvprop': 'ids|userid|timestamp',
                           'rvlimit': '500',
                           'rvstart': start_id} 
            
            # same as before
            wp_call = requests.get(BASE_URL, params=parameters)
            response = wp_call.json()

            query = response['query']
            pages = query['pages']
            page_id_list = list(pages.keys())
            page_id = page_id_list[0]
            page_info = pages[str(page_id)]
            revisions = page_info['revisions']

            for entry in revisions:
                revision_list.append(entry['timestamp'])

    # end by returning a list of revision timestamps
    return revision_list

#python #developer #machine-learning #artificial-intelligence

Politics, Python, and Wikipedia
1.80 GEEK