I’ve always been interested in data analysis and literary criticism. They might seem like two vastly different fields of study, but to me, thinking critically about analytics and classic novels were quite similar activities as they both enabled me to gain new insights on social, cultural, and political issues. Based on my interests, I had an idea to apply natural language processing techniques on analyzing literary or philosophical texts. So I’ve decided to give it a go.

Introduction

In this article, I will share my own experiences of applying NLP processes in unveiling the central themes of one of the greatest American classics; the Great Gatsby. (for those who prefer live presentations to essays, I’ve also made a video which explains my methodology in detail. Check out the link above.)

Image for post

The Great Gatsby film adaptation (2013)

Superficially, The Great Gatsby seems like a typical romance fiction as the plot mainly revolves around the millionaire J. Gatsby’s quest to win back the heart of his long lost love, Daisy Buchanan. However, viewed from the historical context of the hedonistic 1920’s Jazz age, it is evident that the novel’s purpose is to criticize the decay of the American Dream within the era of material excess. I was curious if data science, rather than subjective reading, could help clarify the main idea of the book. I came up with a hypothesis that, if I feed the text data into a topic modeling algorithm, then I could automatically extract the literary themes.

Text Preprocessing

First off, cleaning the original text of ‘The Great Gatsby’ is necessary. The book’s txt.file can be downloaded at the Gutenberg Project where digital texts of various literary masterpieces are displayed. Topic Modeling processes usually require more than one documents, so it would be appropriate to split the corpus into multiple paragraphs.

Then, using the NLTK and TextBlob packages, I eradicated all the stopwords (words that have a high frequency but lacks contextual meaning) and lemmatized(converting words into their original form, such as changing a plural into a singular) each token.

#unsupervised-learning #topic-modeling #textblob #nltk #nlp

Thematic Analysis of The Great Gatsby with Topic Modeling
2.20 GEEK