Identifying hidden trends in news stories

Identifying hidden trends in news stories

Learn how to use a common hierarchical clustering algorithm called agglomerative clustering to find new topic clusters in recent news articles

As data scientists, text analytics on news stories has always been pretty important both from learning as well as practical perspective since it gave us bulk data corpus to train text classification, sentiments analysis, name entity recognition etc. models.

By and large, most of those models were trained on a historical news corpus which used data from past 1–3 years of news stories. This works great in normal times, however, in midst of the covid-19 pandemic, it poses a serious problems for us since news stories now have a faster turnaround cycle.

One way to fight this problem is by running text clustering algorithms on news stories collected in a short time period and identifying emerging trends before they start affecting our pretrained sentiments analysis models too much.

As an example, since the Covid pandemic began, we started seeing a drop in performance in our sentiments analysis models. We mitigated this by running text clustering and topic models and found that a new topic/cluster was emerging around some tokens such as “paycheck protection program”, “lockdowns”, “masks”, “vaccines”, “airborne”. We kept saving data from these “new” clusters until we had enough datapoints to retrain our supervised models.

News API

You can get recent (<24 h) news articles in structured format by accessing a public News data API which exposes data from media monitoring database from Specrom Analytics. You have to register for a free account with Algorithmia. You will get 10,000 credits a month when you sign up and that should be plenty for over 500 API calls a month.

To get your API key, go to dashboard, and click on My API Keys as shown below.

news web-scraping social-listening text-analytics data analytic

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Big Data Analytics: Unrefined Data to Smarter Business Insights -

For Big Data Analytics, the challenges faced by businesses are unique and so will be the solution required to help access the full potential of Big Data.

Data Analytics For Beginners

🔥Intellipaat Data Analytics training course: In this data analytics for beginners video you wi...

How to Define Data Analytics Capabilities | Hacker Noon

Disclaimer: Many points made in this post have been derived from discussions with various parties, but do not represent any individuals or organisations.

Web Scraping Basics: How to scrape data from a website in Python

We always say “Garbage in Garbage out” in data science. If you do not have a good quality and quantity of data, mostly likely you would not get much insights out of it.

Web Scraping using BeautifulSoup- COVID-19 Data

Web scraping is extracting large amounts of unstructured data from websites and storing it in a structured format in a desired file/database. We’ll see how it’s done in this blog.