Chet  Lubowitz

Chet Lubowitz

1599222420

Thematic Analysis of The Great Gatsby with Topic Modeling

I’ve always been interested in data analysis and literary criticism. They might seem like two vastly different fields of study, but to me, thinking critically about analytics and classic novels were quite similar activities as they both enabled me to gain new insights on social, cultural, and political issues. Based on my interests, I had an idea to apply natural language processing techniques on analyzing literary or philosophical texts. So I’ve decided to give it a go.

Introduction

In this article, I will share my own experiences of applying NLP processes in unveiling the central themes of one of the greatest American classics; the Great Gatsby. (for those who prefer live presentations to essays, I’ve also made a video which explains my methodology in detail. Check out the link above.)

Image for post

The Great Gatsby film adaptation (2013)

Superficially, The Great Gatsby seems like a typical romance fiction as the plot mainly revolves around the millionaire J. Gatsby’s quest to win back the heart of his long lost love, Daisy Buchanan. However, viewed from the historical context of the hedonistic 1920’s Jazz age, it is evident that the novel’s purpose is to criticize the decay of the American Dream within the era of material excess. I was curious if data science, rather than subjective reading, could help clarify the main idea of the book. I came up with a hypothesis that, if I feed the text data into a topic modeling algorithm, then I could automatically extract the literary themes.

Text Preprocessing

First off, cleaning the original text of ‘The Great Gatsby’ is necessary. The book’s txt.file can be downloaded at the Gutenberg Project where digital texts of various literary masterpieces are displayed. Topic Modeling processes usually require more than one documents, so it would be appropriate to split the corpus into multiple paragraphs.

Then, using the NLTK and TextBlob packages, I eradicated all the stopwords (words that have a high frequency but lacks contextual meaning) and lemmatized(converting words into their original form, such as changing a plural into a singular) each token.

#unsupervised-learning #topic-modeling #textblob #nltk #nlp

What is GEEK

Buddha Community

Thematic Analysis of The Great Gatsby with Topic Modeling
Chet  Lubowitz

Chet Lubowitz

1599222420

Thematic Analysis of The Great Gatsby with Topic Modeling

I’ve always been interested in data analysis and literary criticism. They might seem like two vastly different fields of study, but to me, thinking critically about analytics and classic novels were quite similar activities as they both enabled me to gain new insights on social, cultural, and political issues. Based on my interests, I had an idea to apply natural language processing techniques on analyzing literary or philosophical texts. So I’ve decided to give it a go.

Introduction

In this article, I will share my own experiences of applying NLP processes in unveiling the central themes of one of the greatest American classics; the Great Gatsby. (for those who prefer live presentations to essays, I’ve also made a video which explains my methodology in detail. Check out the link above.)

Image for post

The Great Gatsby film adaptation (2013)

Superficially, The Great Gatsby seems like a typical romance fiction as the plot mainly revolves around the millionaire J. Gatsby’s quest to win back the heart of his long lost love, Daisy Buchanan. However, viewed from the historical context of the hedonistic 1920’s Jazz age, it is evident that the novel’s purpose is to criticize the decay of the American Dream within the era of material excess. I was curious if data science, rather than subjective reading, could help clarify the main idea of the book. I came up with a hypothesis that, if I feed the text data into a topic modeling algorithm, then I could automatically extract the literary themes.

Text Preprocessing

First off, cleaning the original text of ‘The Great Gatsby’ is necessary. The book’s txt.file can be downloaded at the Gutenberg Project where digital texts of various literary masterpieces are displayed. Topic Modeling processes usually require more than one documents, so it would be appropriate to split the corpus into multiple paragraphs.

Then, using the NLTK and TextBlob packages, I eradicated all the stopwords (words that have a high frequency but lacks contextual meaning) and lemmatized(converting words into their original form, such as changing a plural into a singular) each token.

#unsupervised-learning #topic-modeling #textblob #nltk #nlp

Social media and topic modeling: how to analyze posts in practice

There is a substantial amount of data generated on the internet every second — posts, comments, photos, and videos. These different data types mean that there is a lot of ground to cover, so let’s focus on one — text.

All social conversations are based on written words — tweets, Facebook posts, comments, online reviews, and so on. Being a social media marketer, a Facebook group/profile moderator, or trying to promote your business on social media requires you to know how your audience reacts to the content you are uploading. One way is to read it all, mark hateful comments, divide them into similar topic groups, calculate statistics and… lose a big chunk of your time just to see that there are thousands of new comments to add to your calculations. Fortunately, there is another solution to this problem — machine learning. From this text you will learn:

  • Why do you need specialised tools for social media analyses?
  • What can you get from topic modeling and how it is done?
  • How to automatically look for hate speech in comments?

Why are social media texts unique?

Before jumping to the analyses, it is really important to understand why social media texts are so unique:

  • Posts and comments are short. They mostly contain one simple sentence or even single word or expression. This gives us a limited amount of information to obtain just from one post.

Image for post

  • Emojis and smiley faces — used almost exclusively on social media. They give additional details about the author’s emotions and context.

Image for post

  • Slang phrases which make posts resemble spoken language rather than written. It makes statements appear more casual.

Image for post

These features make social media a whole different source of information and demand special attention while running an analysis using machine learning. In contrast, most open-source machine learning solutions are based on long, formal text, like Wikipedia articles and other website posts. As a result, these models perform badly on social media data, because they don’t understand additional forms of expression included. This problem is called domain shift and is a typical NLP problem. Different data also require customised data preparation methods called preprocessing. The step consists of cleaning text from invaluable tokens like URLs or mentions and conversion to machine readable format (more about how we do it in Sotrender). This is why it is crucial to use tools created especially for your data source to get the best results.

#analysis #social-media #data-science #topic-modeling #data analysis

Tyrique  Littel

Tyrique Littel

1604008800

Static Code Analysis: What It Is? How to Use It?

Static code analysis refers to the technique of approximating the runtime behavior of a program. In other words, it is the process of predicting the output of a program without actually executing it.

Lately, however, the term “Static Code Analysis” is more commonly used to refer to one of the applications of this technique rather than the technique itself — program comprehension — understanding the program and detecting issues in it (anything from syntax errors to type mismatches, performance hogs likely bugs, security loopholes, etc.). This is the usage we’d be referring to throughout this post.

“The refinement of techniques for the prompt discovery of error serves as well as any other as a hallmark of what we mean by science.”

  • J. Robert Oppenheimer

Outline

We cover a lot of ground in this post. The aim is to build an understanding of static code analysis and to equip you with the basic theory, and the right tools so that you can write analyzers on your own.

We start our journey with laying down the essential parts of the pipeline which a compiler follows to understand what a piece of code does. We learn where to tap points in this pipeline to plug in our analyzers and extract meaningful information. In the latter half, we get our feet wet, and write four such static analyzers, completely from scratch, in Python.

Note that although the ideas here are discussed in light of Python, static code analyzers across all programming languages are carved out along similar lines. We chose Python because of the availability of an easy to use ast module, and wide adoption of the language itself.

How does it all work?

Before a computer can finally “understand” and execute a piece of code, it goes through a series of complicated transformations:

static analysis workflow

As you can see in the diagram (go ahead, zoom it!), the static analyzers feed on the output of these stages. To be able to better understand the static analysis techniques, let’s look at each of these steps in some more detail:

Scanning

The first thing that a compiler does when trying to understand a piece of code is to break it down into smaller chunks, also known as tokens. Tokens are akin to what words are in a language.

A token might consist of either a single character, like (, or literals (like integers, strings, e.g., 7Bob, etc.), or reserved keywords of that language (e.g, def in Python). Characters which do not contribute towards the semantics of a program, like trailing whitespace, comments, etc. are often discarded by the scanner.

Python provides the tokenize module in its standard library to let you play around with tokens:

Python

1

import io

2

import tokenize

3

4

code = b"color = input('Enter your favourite color: ')"

5

6

for token in tokenize.tokenize(io.BytesIO(code).readline):

7

    print(token)

Python

1

TokenInfo(type=62 (ENCODING),  string='utf-8')

2

TokenInfo(type=1  (NAME),      string='color')

3

TokenInfo(type=54 (OP),        string='=')

4

TokenInfo(type=1  (NAME),      string='input')

5

TokenInfo(type=54 (OP),        string='(')

6

TokenInfo(type=3  (STRING),    string="'Enter your favourite color: '")

7

TokenInfo(type=54 (OP),        string=')')

8

TokenInfo(type=4  (NEWLINE),   string='')

9

TokenInfo(type=0  (ENDMARKER), string='')

(Note that for the sake of readability, I’ve omitted a few columns from the result above — metadata like starting index, ending index, a copy of the line on which a token occurs, etc.)

#code quality #code review #static analysis #static code analysis #code analysis #static analysis tools #code review tips #static code analyzer #static code analysis tool #static analyzer

Ian  Robinson

Ian Robinson

1623856080

Streamline Your Data Analysis With Automated Business Analysis

Have you ever visited a restaurant or movie theatre, only to be asked to participate in a survey? What about providing your email address in exchange for coupons? Do you ever wonder why you get ads for something you just searched for online? It all comes down to data collection and analysis. Indeed, everywhere you look today, there’s some form of data to be collected and analyzed. As you navigate running your business, you’ll need to create a data analytics plan for yourself. Data helps you solve problems , find new customers, and re-assess your marketing strategies. Automated business analysis tools provide key insights into your data. Below are a few of the many valuable benefits of using such a system for your organization’s data analysis needs.

Workflow integration and AI capability

Pinpoint unexpected data changes

Understand customer behavior

Enhance marketing and ROI

#big data #latest news #data analysis #streamline your data analysis #automated business analysis #streamline your data analysis with automated business analysis

Hertha  Walsh

Hertha Walsh

1603357200

Text analysis basics in Python

This article talks about the most basic text analysis tools in Python. We are not going into the fancy NLP models. Just the basics. Sometimes all you need is the basics :)

Let’s first get some text data. Here we have a list of course reviews that I made up. What can we do with this data? The first question that comes to mind is can we tell which reviews are positive and which are negative? Can we do some sentiment analysis on these reviews?

corpus = [
'Great course. Love the professor.',
'Great content. Textbook was great',
'This course has very hard assignments. Great content.',
'Love the professor.',
'Hard assignments though',
'Hard to understand.'
]

Sentiment analysis

Great, let’s look at the overall sentiment analysis. I like to work with pandas data frame. So let’s create a pandas data frame from the list.

import pandas as pd
df = pd.DataFrame(corpus)
df.columns = ['reviews']

Next, let’s install the library textblob (conda install textblob -c conda-forge) and import the library.

from textblob import TextBlob
df['polarity'] = df['reviews'].apply(lambda x: TextBlob(x).polarity)
df['subjective'] = df['reviews'].apply(lambda x: TextBlob(x).subjectivity)

We then can calculate the sentiment through the polarity function. polarity ranges from -1 to 1, with -1 being negative and 1 being positive. The TextBlob can also use the subjectivity function to calculate subjectivity, which ranges from 0 to 1, with 0 being objective and 1 being subjective.

#sentiment-analysis #python #data-science #topic-modeling #text-analysis