In this tutorial I will show you how to build a word cloud of a text in Python, using the wordcloud
package.
In the example, I will build the wordcloud of the Saint Augustines’ Confessions, which can be downloaded from the Gutemberg Project Page. The masterpiece is split in 13 books. We have stored each book into a different file, named number.text (e.g. 1.txt and 2.txt). Each line of every file contains just one sentence.
The source code of this tutorial can be downloaded from my Github repository.
The first step towards the creation of the word cloud involves the installation of the wordcloud package, through the command pip install wordcloud
. Then you can import the class WordCloud
as well as the list of STOPWORDS
.
from wordcloud import WordCloud, STOPWORDS
If you want, you can add other stopwords to your list.
stopwords = set(STOPWORDS)
stopwords.add('thou')
The Wordcloud
function needs a sentence as input containing all the words for which the word cloud should be calculated. In our case, we should store all the text of the masterpiece into a variable. We can read the text of each book by opening the related file and store it into a global variable, called all_text
.
all_text = ""
for book in range(1,14):
file = open('sources/' + str(book) + '.txt')
lines = file.readlines()
for line in lines:
all_text += " " + line
#data-science #word-cloud #text-analysis #python