We are surrounded by large volumes of text — emails, messages, documents, reports — and it’s a challenge for individuals and businesses alike to monitor, collate, interpret and otherwise make sense of it all. Over recent years, an area of natural language processing called topic modeling has made great strides in meeting this challenge. This article introduces topic modeling — how it works and what it’s used for — through an intuitive explanation of a popular topic modeling approach called Latent Dirichlet Allocation.

The volume of text that surrounds us is vast. And it’s growing.

Emails, web pages, tweets, books, journals, reports, articles and more. And with the growing reach of the internet and web-based services, more and more people are being connected to, and engaging with, digitized text every day.

Accompanying this is the growth of text analytics services. Businesswire, a news and multimedia company, estimates that the market for text analytics will grow by 20% per year to 2024, or by over $8.7 billion.

As text analytics evolves, it is increasingly using artificial intelligence, machine learning and natural language processing to explore and analyze text in a variety of ways.

But text analysis isn’t always straightforward.

One of the key challenges with machine learning, for instance, is the need for large quantities of labeled data in order to use supervised learning techniques.

An example of this is classifying spam emails.

A supervised learning approach can be used for this by training a network on a large collection of emails that are pre-labeled as being spam or not. If such a collection doesn’t exist however, it needs to be created, and this takes a lot of time and effort.

Supervised learning can yield good results if labeled data exists, but most of the text that we encounter isn’t well structured or labeled.

This is where unsupervised learning approaches like topic modeling can help.

#data-science #naturallanguageprocessing #topic-modeling

Topic Modeling with LDA: An Intuitive Explanation
2.25 GEEK