Note:_ The methodology behind the approach discussed in this post stems from my PhD thesis and can be found in this academic paper._

INTRODUCTION

Sentiment analysis (or opinion mining) aims to automatically extract and classify sentiments (the subjective part of an opinion) and/or emotions (the projections or display of a feeling) expressed in text.

There are several language features that we use to indicate sentiment within text. Features can take the form of single words (unigrams), short phrases (bigrams), and longer phrases (n-grams), emoticons (e.g. :) is commonly used to represent positive sentiment), slang (e.g. chuffed, do one’s nut), abbreviations (e.g. great — GR8), onomatopoeic elements (e.g. gr, hm), as well as the use of upper case, punctuation (e.g. !!, ?!), and repetitions of letters (e.g. sweeeeet) for affective emphasis. These features are often extracted from text and presented to machine learning models which are trained to classify the sentiment expressed within them based on the features they contain.

Although these features are extensively used in sentiment analysis, less attention has been paid towards the effect of ignoring idioms as features. In this case, this post investigates the importance of including idioms as features in sentiment analysis by comparing the performance of two state of the art tools when idioms are represented and when they are not. There are two requirements to achieve this: 1) the sentiment associated with an idiom needs to be identified, and 2) idioms need to be automatically recognised in text.

WHAT ARE IDIOMS?

Before we get technical, let’s first define what idioms are.

Idioms are often defined as multi-worded expressions (expressions or phrases which are made up of at least 2 words). But what makes them different to other phrases is that their overall meaning can’t be guessed from the literal meaning of each word which form the idiom. For example, a fish out of water is used to refer to someone who feels uncomfortable in a particular situation, not its literal sense. The following figure provides other examples of English idioms.

But because of this, idioms are a challenge for language learners. It’s therefore common for them and their meanings to be taught and remembered, as opposed to learning their structures.

To distinguish idioms from other phrases and sayings, the following properties can be considered:

  • Conventionality: The overall meaning of an idiom can’t (entirely) be predicted from the literal meaning of each word which form them.
  • Inflexibility: Their syntax is restricted, i.e. they don’t vary much in the way they are composed.
  • Figuration: They typically have figurative meaning stemming from metaphors, hyperboles and other types of figuration.
  • Proverbiality: They usually describe a recurrent social situation.
  • Informality: They’re associated with less formal language such as colloquialism.
  • Affect: They typically imply an affective stance toward something rather than a neutral one.

The last property, affect, implies that an idiom itself may be useful in determining the sentiment expressed within a piece of text. For example, “I am over the moon with how it turned out” expresses a positive sentiment.

#nlp #sentiment-analysis #editors-pick #data-analysis

🗣️ Sentiment Analysis: Idioms and their Importance
1.30 GEEK