How to build a wordcloud in Python

In this tutorial I will show you how to build a word cloud of a text in Python, using the wordcloud package.

In the example, I will build the wordcloud of the Saint Augustines’ Confessions, which can be downloaded from the Gutemberg Project Page. The masterpiece is split in 13 books. We have stored each book into a different file, named number.text (e.g. 1.txt and 2.txt). Each line of every file contains just one sentence.

The source code of this tutorial can be downloaded from my Github repository.

Getting started

Install and get familiar with the wordcloud package

The first step towards the creation of the word cloud involves the installation of the wordcloud package, through the command pip install wordcloud. Then you can import the class WordCloud as well as the list of STOPWORDS.

from wordcloud import WordCloud, STOPWORDS

If you want, you can add other stopwords to your list.

stopwords = set(STOPWORDS) 
stopwords.add('thou')

The Wordcloud function needs a sentence as input containing all the words for which the word cloud should be calculated. In our case, we should store all the text of the masterpiece into a variable. We can read the text of each book by opening the related file and store it into a global variable, called all_text.

all_text = ""
for book in range(1,14):
    file = open('sources/' + str(book) + '.txt')
    lines = file.readlines()

    for line in lines:
        all_text += " " + line

#data-science #word-cloud #text-analysis #python

Getting started

Install and get familiar with the wordcloud package

towardsdatascience.com

How to build a wordcloud in Python