It is easy, but is it reliable? Sentiment analysis is an important part of text mining, and is essential in automated processing of social mentions, customer interactions, product reviews and myriad other applications.
Sentiment analysis is an important part of text mining, and is essential in automated processing of social mentions, customer interactions, product reviews and myriad other applications. Broadly speaking, sentiment analysis comes in two flavors — supervised and unsupervised — with supervised being touted as the preferred method due to its context sensitivity, which results in higher accuracy. However, unsupervised sentiment analysis has the advantage of simplicity and speed, which saves time and effort, since we don’t have to painstakingly build a training data set. The question, though, is whether the ease of use comes at the cost of reliability. In the process of building a portal for diabetes management and care, I found that supervised sentiment analysis using machine learning on forum posts performed significantly better than unsupervised methods without any modification in the latter’s lexicon. However, the accuracy of the unsupervised approach improved when I adjusted the weights of frequently occurring words to suit the context, and as I did so with more words, I was rewarded with even higher accuracy. Without much effort, I almost matched the accuracy of the supervised method. So, what is the low down on which method to choose? While there is no free lunch, we can get more-than-satisfactory results by putting in just a little bit of time with an unsupervised approach. I outline below the steps to obtaining high accuracy, while upholding or even improving upon the 80/20 Pareto principle! I also provide guesstimates of the time I spent on the two methods.
Supervised sentiment analysis is basically a classification or prediction problem. We manually read a large quantity of documents such as movie reviews, and label each one as positive, negative, or neutral; alternatively, we can assign a score on a scale such as -5 to +5, where 0 indicates a neutral sentiment. This step provides the ground truth. Then we can use any classification or prediction method — naïve Bayes, logistic (for binary output only), neural networks, etc. — to create a model which takes text as input, and produces a sentiment class or score as output. While time intensive, the major advantage of this method is its sensitivity to the context, and hence higher accuracy.
Let’s say we have a movie review, “this was one scary movie, I finished a large bag of popcorn without even realizing it”. A human reading this review will most likely assign a positive sentiment label to it. While the word “scary” would have a negative weight by default in an unsupervised sentiment analysis tool, in the context of horror movies or a theme park ride, “scary” can be interpreted as having a positive connotation. Since this labeled review will become a part of the training data, a prediction model will learn to associate the word “scary” with a positive sentiment in the context of movie reviews.
Creating a training data set by reading hundreds or even thousands of posts can be frustrating. We can outsource the job, for example, to Amazon Mechanical Turks, but we still have to check the accuracy of the labels themselves. An easier alternative is to use an unsupervised method. While they come in many forms, almost all unsupervised sentiment analyzers use a lexicon of a few thousand words with default weights ranging from negative to positive values. These analyzers, especially the newer ones, can handle many variations in the usage of words:
“I didn’t like this movie.” (negation)
“This book is good, but it’s too long.” (sentiment shifter)
“This movie will knock your socks off.” (idiom, example taken from Valence Aware Dictionary for Sentiment Reasoning)
Unlike the supervised method I mentioned above, this unsupervised approach will not be able to figure out that a scary movie or a scary theme park ride is a good thing. Additionally, the usage of words changes over time. Expressions like “that’s a sick movie” or “that’s so wicked” have positive connotations, but the sentiment analyzer lexicons, especially those created decades ago, may fail to detect the correct sentiment.
Learn Machine Learning with Python using neural networks with this machine learning beginners course. In this tutorial we will look at taking an existing sol...
In this video, Deep Learning Tutorial with Python | Machine Learning with Neural Networks Explained, Frank Kane helps de-mystify the world of deep learning and artificial neural networks with Python!
What do you mean by Sentiment analysis in Machine Learning ?I showed how classifying a movie review (good or bad )is a part of text classification, So deciding whether a movie is good ,bad or neutral is a part of sentiment analysis.
Natural Language Processing (NLP) is the area of machine learning that focuses on the generation and understanding of language. Its main objective is to enable machines to understand, communicate and interact with humans in a natural way.
We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.