Learn how to use a common hierarchical clustering algorithm called agglomerative clustering to find new topic clusters in recent news articles
As data scientists, text analytics on news stories has always been pretty important both from learning as well as practical perspective since it gave us bulk data corpus to train text classification, sentiments analysis, name entity recognition etc. models.
By and large, most of those models were trained on a historical news corpus which used data from past 1–3 years of news stories. This works great in normal times, however, in midst of the covid-19 pandemic, it poses a serious problems for us since news stories now have a faster turnaround cycle.
One way to fight this problem is by running text clustering algorithms on news stories collected in a short time period and identifying emerging trends before they start affecting our pretrained sentiments analysis models too much.
As an example, since the Covid pandemic began, we started seeing a drop in performance in our sentiments analysis models. We mitigated this by running text clustering and topic models and found that a new topic/cluster was emerging around some tokens such as “paycheck protection program”, “lockdowns”, “masks”, “vaccines”, “airborne”. We kept saving data from these “new” clusters until we had enough datapoints to retrain our supervised models.
You can get recent (<24 h) news articles in structured format by accessing a public News data API which exposes data from media monitoring database from Specrom Analytics. You have to register for a free account with Algorithmia. You will get 10,000 credits a month when you sign up and that should be plenty for over 500 API calls a month.
To get your API key, go to dashboard, and click on My API Keys as shown below.
For Big Data Analytics, the challenges faced by businesses are unique and so will be the solution required to help access the full potential of Big Data.
🔥Intellipaat Data Analytics training course: https://intellipaat.com/data-analytics-master-training-course/ In this data analytics for beginners video you wi...
Disclaimer: Many points made in this post have been derived from discussions with various parties, but do not represent any individuals or organisations.
We always say “Garbage in Garbage out” in data science. If you do not have a good quality and quantity of data, mostly likely you would not get much insights out of it.
Web scraping is extracting large amounts of unstructured data from websites and storing it in a structured format in a desired file/database. We’ll see how it’s done in this blog.