How To Use ThreadPoolExecutor in Python 3

The author selected the COVID-19 Relief Fund to receive a donation as part of the Write for DOnations program.

Introduction

Python threads are a form of parallelism that allow your program to run multiple procedures at once. Parallelism in Python can also be achieved using multiple processes, but threads are particularly well suited to speeding up applications that involve significant amounts of I/O (input/output).

Example I/O-bound operations include making web requests and reading data from files. In contrast to I/O-bound operations, CPU-bound operations (like performing math with the Python standard library) will not benefit much from Python threads.

Python 3 includes the ThreadPoolExecutor utility for executing code in a thread.

In this tutorial, we will use ThreadPoolExecutor to make network requests expediently. We’ll define a function well suited for invocation within threads, use ThreadPoolExecutor to execute that function, and process results from those executions.

For this tutorial, we’ll make network requests to check for the existence of Wikipedia pages.

Note: The fact that I/O-bound operations benefit more from threads than CPU-bound operations is caused by an idiosyncrasy in Python called the, global interpreter lock. If you’d like, you can learn more about Python’s global interpreter lock in the official Python documentation

Prerequisites

To get the most out of this tutorial, it is recommended to have some familiarity with programming in Python and a local Python programming environment with requests installed.

You can review these tutorials for the necessary background information:

How to Code in Python 3
How To Install Python 3 and Set Up a Local Programming Environment on Ubuntu 18.04
To install the requests package into your local Python programming environment, you can run this command:

pip install --user requests==2.23.0

Step 1 — Defining a Function to Execute in Threads

Let’s start by defining a function that we’d like to execute with the help of threads.

Using nano or your preferred text editor/development environment, you can open this file:

nano wiki_page_function.py

For this tutorial, we’ll write a function that determines whether or not a Wikipedia page exists:

import requests

def get_wiki_page_existence(wiki_page_url, timeout=10):
    response = requests.get(url=wiki_page_url, timeout=timeout)

    page_status = "unknown"
    if response.status_code == 200:
        page_status = "exists"
    elif response.status_code == 404:
        page_status = "does not exist"

    return wiki_page_url + " - " + page_status

The get_wiki_page_existence function accepts two arguments: a URL to a Wikipedia page (wiki_page_url), and a timeout number of seconds to wait for a response from that URL.

get_wiki_page_existence uses the [requests](https://requests.readthedocs.io/en/master/) package to make a web request to that URL. Depending on the status code of the HTTP response, a string is returned that describes whether or not the page exists. Different status codes represent different outcomes of a HTTP request. This procedure assumes that a 200 “success” status code means the Wikipedia page exists, and a 404 “not found” status code means the Wikipedia page does not exist.

As described in the Prerequisites section, you’ll need the requests package installed to run this function.

Let’s try running the function by adding the url and function call following the get_wiki_page_existence function:

. . .
url = "https://en.wikipedia.org/wiki/Ocean"
print(get_wiki_page_existence(wiki_page_url=url))

Once you’ve added the code, save and close the file.

If we run this code:

python wiki_page_function.py

We’ll see output like the following:

Output
https://en.wikipedia.org/wiki/Ocean - exists

Calling the get_wiki_page_existence function with a valid Wikipedia page returns a string that confirms the page does, in fact, exist.

Warning: In general, it is not safe to share Python objects or state between threads without taking special care to avoid concurrency bugs. When defining a function to execute in a thread, it is best to define a function that performs a single job and does not share or publish state to other threads. get_wiki_page_existence is an example of such a function.

#threadpoolexecutor #python 3

Introduction

Prerequisites

Step 1 — Defining a Function to Execute in Threads

digitalocean.com

How To Use ThreadPoolExecutor in Python 3